Practitioner GuideMay 20, 202613 min read

How to Write a Penetration Testing Report: Structure, Finding Quality, and Remediation Recommendations

Sources:PTES: Penetration Testing Execution Standard|OWASP Testing Guide: Reporting|CVSS v4.0 Specification|SANS: Writing a Penetration Testing Report

Eric Bang

Founder & Cybersecurity Evangelist

60%

Of pentest findings go unremediated within 90 days of reporting (Cobalt Core Data)

Executive summary

The section most-read by decision-makers who control remediation budget

CVSS 4.0

Current scoring standard replacing CVSS 3.1

Steps to reproduce

The single most important element of an actionable finding

A penetration test that produces a weak report is a partial engagement. The test is complete; the value is not. Findings that cannot be reproduced, risk ratings that are not calibrated to organizational context, and remediation recommendations that describe what to fix without explaining how will sit in a backlog until the next assessment. The best pentest reports drive remediation: a developer reads a finding and knows exactly what the vulnerability is, why it matters, how to verify it exists, and what to do about it. A CISO reads the executive summary and understands the overall risk posture without needing to read every technical finding. This guide covers the structure and content quality standards that distinguish reports that drive action from reports that get filed.

Report Structure: What Every Penetration Testing Report Must Include

A complete penetration test report has four primary sections, each serving a different audience.

1. Cover page and engagement metadata:

Client name, assessment type (web application, internal network, red team, etc.)
Assessment dates (testing window start and end)
Report date and version number
Tester names and certifications
Classification level (Confidential -- Restricted Distribution)
Document control table (version history, review status)

2. Executive summary (2-4 pages): Written for CISO, CTO, or board-level audience. No technical jargon. Covers: overall risk posture, the two or three most critical findings and their business impact, the attack narrative in plain language, and the recommended priority remediation actions. The executive summary should be self-contained -- a reader who reads only the executive summary should understand the key risks.

3. Technical findings (the main body): One finding per page (or section) with standardized structure. Each finding includes: title, severity rating, CVSS score, affected systems, description, steps to reproduce, evidence (screenshots, request/response pairs, tool output), business impact, and remediation recommendation.

4. Appendices:

Full scope list (all tested URLs, IP ranges, applications)
Methodology summary (testing framework followed, tools used)
Vulnerability scoring methodology
Glossary (for client organizations with limited security background)
Tool output (raw Nmap scans, Burp Suite logs, if requested)

What separates good reports from mediocre ones: The main body finding quality. Finding count and severity distribution are secondary to whether each finding is written clearly enough for the remediation engineer to understand the root cause without asking follow-up questions.

Executive Summary: Writing for Non-Technical Stakeholders

The executive summary is not a condensed technical summary. It is a business risk communication document. Translate technical findings into business language without losing accuracy.

Structure for the executive summary:

Opening paragraph -- Overall assessment conclusion: State the overall risk level and what it means for the organization. "During the assessment period, [Company] demonstrated [strong/adequate/insufficient] security controls across [tested scope]. Testers identified [X critical, Y high, Z medium, N low] findings, of which [the two most critical] represent the most significant risk to [business impact]."

Attack narrative (if applicable): For assessments where the tester achieved significant compromise, describe the attack path in plain language: "Beginning with publicly accessible credentials obtained from a breach dataset, the assessment team gained access to the company's internal network, escalated privileges to domain administrator within [timeframe], and demonstrated the ability to access [specific sensitive data or systems]." This narrative is more impactful than a finding count table because it shows the realistic attacker perspective.

Key findings summary: Two to four bullet points covering the highest-severity findings in business language. Not: "SQL injection vulnerability in the login endpoint (CVE CVSS 9.8)." Instead: "An attacker with internet access could extract the complete customer database, including [X million] customer records and [specific sensitive data], without any login credentials."

Remediation priority: A prioritized list of the two or three actions the organization should take immediately (within 30 days) versus medium-term actions (30-90 days). This gives decision-makers a clear action framework without requiring them to read every technical finding.

Avoid in the executive summary: CVE identifiers, tool names, CVSS scores, port numbers, and specific technical implementation details. These belong in the technical findings section. Executive summary language should pass the "would a CFO understand this?" test.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Finding Write-Up Format: Reproducing, Rating, and Remediating

Each finding should follow a consistent structure. Inconsistency in finding format signals a low-quality report and makes it harder for remediation teams to process findings systematically.

Standard finding fields:

Title: Short, specific, descriptive. "SQL Injection in /api/users/search endpoint" is better than "SQL Injection" or "Database Vulnerability." The title should identify the vulnerability class and the specific location.

Severity: Critical, High, Medium, Low, Informational. Defined by CVSS score range or organizational risk rating methodology. Consistent with the vulnerability scoring section of the report.

CVSS Score and Vector: Include the full CVSS 4.0 vector string so the reader can verify the score calculation. Example: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H. If the organizational context modifies the CVSS-derived score (a critical finding in an isolated test environment may be downgraded to High for the production risk rating), document the rationale.

Affected systems: Specific URLs, IP addresses, hostnames, or application components. Not "the web application" -- the specific endpoint, API, or system component.

Description: Explain what the vulnerability is, why it exists, and what the security impact is. This should be understandable to a developer who is not a security specialist. Avoid jargon without definition.

Steps to reproduce: This is the most important section for remediation quality. Provide numbered, copy-paste-quality steps a developer can follow to reproduce the finding in a test environment. Include: the specific HTTP request (or other protocol), any required authentication context (session token, user account), the observed vulnerable response, and comparison to an expected secure response. If you used a tool, include the specific command.

Evidence: Screenshots, HTTP request/response pairs (in raw format, not just screenshots), tool output. Evidence should be sufficient to verify the finding exists without rerunning the test. Redact any sensitive data (PII, credentials, internal IPs) that is not necessary to demonstrate the vulnerability.

Business impact: Translate the technical impact into business risk. Data confidentiality impact, integrity impact, availability impact. If the finding enables access to specific sensitive data (PII, financial records, credentials), name it.

Remediation recommendation: See the dedicated section below.

CVSS Scoring and Risk Rating Calibration

CVSS (Common Vulnerability Scoring System) provides a standardized method for rating vulnerability severity. CVSS 4.0, released in 2023, replaces CVSS 3.1 with a more granular scoring model.

CVSS 4.0 base metric groups:

Exploitability metrics: Attack Vector (AV), Attack Complexity (AC), Attack Requirements (AT), Privileges Required (PR), User Interaction (UI)
Impact metrics: Vulnerable System Confidentiality (VC), Vulnerable System Integrity (VI), Vulnerable System Availability (VA), Subsequent System Confidentiality (SC), Subsequent System Integrity (SI), Subsequent System Availability (SA)

Severity ranges (CVSS 4.0): None (0.0), Low (0.1-3.9), Medium (4.0-6.9), High (7.0-8.9), Critical (9.0-10.0)

CVSS calibration for pentest context: CVSS scores describe the severity of a vulnerability in isolation, not the risk to your specific organization. Adjust your reported severity based on:

Exploitability in context: A critical CVSS vulnerability in a system not exposed to the internet may be rated High in the pentest report, with a note explaining the network access requirement
Data sensitivity: The same vulnerability in a development environment (no production data) vs. production (customer PII) may be rated differently
Compensating controls: A vulnerability mitigated by a compensating control (WAF, network segmentation) should be noted, but not automatically downgraded -- compensating controls can fail

The CVSS vs. business risk gap: CVSS scores the technical severity of the vulnerability. Business risk adds: likelihood of exploitation given your threat landscape, asset criticality (the same vulnerability on a public-facing authentication endpoint vs. an internal read-only reporting dashboard represents different risk), and regulatory/compliance implications. Many pentest reports include both the CVSS score and an adjusted "organizational risk rating" that accounts for these factors.

Remediation Recommendations That Engineers Can Act On

Remediation recommendations are the difference between a report that drives remediation and one that gets filed. Vague recommendations produce vague remediation. Specific, actionable recommendations produce completed tickets.

What makes a remediation recommendation actionable:

Specific, not generic: "Implement input validation" is not actionable. "Parameterize the SQL query in /api/users/search to separate query logic from user-supplied input. Replace the current string concatenation (query = 'SELECT * FROM users WHERE name=' + username) with a prepared statement using db.query('SELECT * FROM users WHERE name = ?', [username])" is actionable.
Root cause addressed, not symptoms: A WAF rule that blocks the specific SQL injection payload demonstrated in the pentest is a compensating control, not remediation. The remediation is fixing the vulnerable query. Note compensating controls as interim measures and specify the actual root cause fix.
Technology-specific: If you know the target application is running on Node.js with PostgreSQL, provide remediation examples in Node.js. Generic "use parameterized queries" advice is less useful than specific API usage for the framework in use.
Prioritized: When multiple remediation steps are required, order them by impact and dependency. "Step 1: Parameterize the affected query (immediate, eliminates the injection vector). Step 2: Implement input validation as defense in depth. Step 3: Review all database queries in the application for similar patterns using the grep pattern db.query(.*+.*."
Verification guidance: Include how the development team can verify the remediation is complete without rerunning the full pentest. A curl command that demonstrates the finding no longer reproduces, or a unit test case that validates the secure behavior, closes the feedback loop.

For findings requiring architectural change: Acknowledge the complexity and provide an interim compensating control alongside the long-term remediation. An insecure direct object reference vulnerability that requires refactoring the authorization model is a 30-90 day remediation; a WAF rule blocking enumerable object ID patterns is a 1-week interim measure that reduces exploitability while the proper fix is implemented.

Evidence Documentation and Report Quality Standards

Evidence quality is what separates a credible finding from a contested one. Poor evidence leads to remediation teams questioning whether findings are real, which delays action.

Evidence documentation standards:

HTTP requests and responses: Capture in raw text format, not just screenshots. Raw format allows reproduction. Include: the complete HTTP request (method, URL, headers including cookies, body), the server response (status code, response headers, response body showing the vulnerability). Use Burp Suite's "copy to clipboard" or "save item" for consistent formatting.

Screenshots: Use for UI-based evidence (reflected XSS executing in the browser, admin panel access) where raw requests alone do not fully demonstrate impact. Annotate screenshots with arrows or callouts identifying the relevant elements. Include timestamps where the date/time of testing is relevant (for demonstrating the finding was present during the testing window, not before or after).

Command-line tool output: Include the full command executed and full output, not just a truncated excerpt. Redact output that contains sensitive data not necessary to demonstrate the finding.

Sensitive data in evidence: Redact actual customer PII, real credentials, and internal IP addresses that are not necessary to demonstrate the finding. A SQL injection finding can show column names and row count without including actual customer names or email addresses.

Report quality checklist before delivery:

All CVSS scores verified against the CVSS 4.0 calculator
All steps to reproduce tested in a clean environment (not just documented from memory)
All screenshots readable at 100% zoom on a standard monitor
Spell-check and grammar-check completed (unprofessional writing reduces credibility)
All findings mapped to a testing methodology or framework (OWASP, PTES, NIST 800-115)
Client-specific context applied to risk ratings (not just raw CVSS defaults)
Remediation recommendations reviewed for technical accuracy by a second tester

The bottom line

A pentest report is only as valuable as the remediation it drives. The findings you spent weeks discovering will not be fixed if the report cannot communicate their importance clearly to the CISO, reproduce them reliably for the developer, and provide specific enough remediation guidance for an engineer to act on without follow-up questions. Invest the same quality of effort in the report as in the testing itself. An executive summary that communicates the real business risk, findings with copy-paste-quality reproduction steps, and remediation recommendations calibrated to the specific technology stack will consistently produce higher remediation rates than technically superior testing paired with a weak report.

Frequently asked questions

How long should a penetration testing report be?

Report length should be determined by finding count and complexity, not by a target page count. A web application assessment with 5 findings should produce a 15-25 page report. An internal network assessment with 20 findings may produce a 50-80 page report. Padding reports with boilerplate methodology descriptions, tool lists, and generic vulnerability background to hit a page count target reduces report quality and wastes the client's time. The executive summary should be 2-4 pages regardless of finding count. Each technical finding should be 1-3 pages. Appendices (scope, methodology, tool output) are separate from the finding body and do not need to be inflated.

Should I include informational findings in a penetration testing report?

Yes, but clearly differentiated from scored findings. Informational findings represent security observations that are not directly exploitable vulnerabilities but indicate a security posture issue or provide relevant context: outdated TLS configuration that does not meet current best practices, HTTP security headers missing (when not directly exploitable), verbose error messages revealing stack traces, or enumerable user IDs that do not directly enable unauthorized access. Informational findings should have a distinct severity label, no CVSS score, and a clear note that they do not represent an immediately exploitable risk. They should appear after all scored findings in the report or in a dedicated appendix.

What is the difference between a vulnerability assessment and a penetration test report?

A vulnerability assessment report lists vulnerabilities identified through automated scanning and manual verification without demonstrating exploitation. A penetration test report documents not just the vulnerability existence but the exploitation path: how the tester used the vulnerability to achieve a specific impact (data extraction, privilege escalation, lateral movement). Pentest reports include an attack narrative showing the chain of vulnerabilities exploited in sequence, not just individual findings in isolation. The exploitation evidence (demonstrated impact) is what justifies the severity rating and drives remediation prioritization -- a vulnerability that was demonstrated to extract customer data is remediated faster than one that is theoretically exploitable.

How should I handle findings that the client disputes?

Document the dispute in the report with both perspectives: the tester's finding and the client's technical rationale for disputing it. If the dispute is about reproducibility, schedule a finding review session to reproduce the finding in the client's presence before the report is finalized. If the dispute is about severity rating, document the tester's assessment with rationale and note the client's alternative assessment with their rationale, then assign an agreed risk acceptance if they choose to accept the risk. Never silently remove or downgrade a finding because a client pushed back without documenting the rationale. A report that reflects the actual security posture, including contested findings with documented rationale, is more valuable than a cleaned-up report that omits real risk.

What CVSS version should I use for penetration testing reports in 2026?

Use CVSS 4.0, which was released in November 2023 and is now the current standard. CVSS 4.0 adds more granular metrics including Attack Requirements (distinguishing vulnerabilities that require specific target conditions) and Subsequent System Impact (impact on systems beyond the directly vulnerable component). Many scanners and vulnerability management platforms still default to CVSS 3.1 scoring; be explicit in your methodology section which version you used and include the full vector string. If a client's vulnerability management platform does not yet support CVSS 4.0, include both the CVSS 4.0 score and a CVSS 3.1 equivalent in the finding for compatibility.

How do I write remediation recommendations for vulnerabilities with complex root causes?

Layer the remediation recommendation into three parts: immediate mitigation, root cause fix, and verification. The immediate mitigation is what can be done in 24-72 hours to reduce exploitability while the root cause fix is developed -- a WAF rule, a firewall change, disabling a specific endpoint. The root cause fix is the actual code or configuration change that eliminates the vulnerability. The verification step is a test or review that confirms the fix is complete. For architectural issues (broken authorization model, missing centralized authentication), acknowledge the remediation timeline realistically and focus the immediate section on compensating controls that reduce risk while the architectural change is planned.

Should a penetration testing report include a risk heat map or scoring dashboard?

A severity distribution chart (count of Critical, High, Medium, Low, Informational findings) in the executive summary is useful for visual context, but a risk heat map or complex scoring dashboard adds little value for most engagements. Decision-makers need to know: what are the two or three things we must fix immediately, and what is the overall risk posture. A finding count pie chart communicates the distribution; a 4x4 risk matrix with 15 data points communicates visual complexity without adding clarity. If you include visual elements, keep them simple and ensure they support the narrative rather than replacing it.

Sources & references

Free resources

Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Author

Eric BangCISSP

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

View profile →LinkedIn

Back to all briefings

Subscribe for Updates

Penetration Testing Pentest Report Security Assessment CVSS Vulnerability Reporting Offensive Security Red Team

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.