68%
Of pentest engagements identify at least one critical finding that was unknown to the client before testing
43 days
Average time between pentest completion and remediation of critical findings
31%
Of organizations conduct penetration testing less than once per year — too infrequent for meaningful risk management
4 hrs
Median time for an experienced tester to reach a foothold from external network reconnaissance on a typical enterprise target

A penetration test is a structured simulation of an attacker attempting to compromise your environment, conducted by authorized testers who document methods, findings, and evidence. Done well, it tells you what an attacker could do with the access they could realistically obtain — not just which CVEs are theoretically present. Done poorly, it is an automated vulnerability scan with a report wrapper.

This guide covers the standard phases of a professional penetration test, how to evaluate methodology frameworks, what tooling should be used in each phase, the scoping decisions that determine the value of the engagement, and what separates a useful pentest report from an expensive document that collects dust.

The Six Phases of a Professional Penetration Test

Professional penetration testing follows a structured sequence that maps roughly to how real attackers operate, with the critical addition of documentation and authorization at every step.

Phase 1 — Scoping and Rules of Engagement: Before any testing begins, the engagement scope is defined in writing: which IP ranges, domains, applications, and physical locations are in scope; which are explicitly out of scope; what techniques are authorized (social engineering, physical access, destructive testing); notification requirements (who must be informed before testing begins); and emergency contacts if critical systems are inadvertently affected. This document, signed by both parties, is what distinguishes authorized penetration testing from criminal activity.

Phase 2 — Reconnaissance: Passive and active information gathering about the target. Passive reconnaissance uses OSINT sources (WHOIS, certificate transparency, LinkedIn, Shodan, DNS enumeration) without touching target systems. Active reconnaissance initiates contact with target systems: port scanning, service enumeration, and web crawling. The intelligence gathered here drives all subsequent phases.

Phase 3 — Scanning and Enumeration: Systematic mapping of discovered attack surface. Network scanning (Nmap for host discovery, port identification, and service fingerprinting), vulnerability scanning (Nessus, OpenVAS, or Nuclei for known vulnerability identification), and application enumeration (directory brute-forcing, API endpoint discovery, authentication mechanism identification).

Phase 4 — Exploitation: Attempting to leverage identified vulnerabilities to gain unauthorized access. This is where penetration testing diverges most clearly from vulnerability assessment — testers actively attempt exploitation, not just identification. Exploitation may chain multiple vulnerabilities (low-severity misconfiguration plus known CVE plus credential reuse) to demonstrate realistic attack paths.

Phase 5 — Post-Exploitation: After gaining initial access, demonstrating the impact of that access. This includes privilege escalation, lateral movement, persistence establishment, data access, and exfiltration simulation. Post-exploitation is often where the most valuable findings emerge — the difference between what a vulnerability theoretically enables and what an attacker could realistically accomplish.

Phase 6 — Reporting: Documentation of all findings with evidence, business impact analysis, CVSS scoring, and remediation guidance. The report is the deliverable that provides lasting value after the engagement concludes.

Framework Comparison: PTES, OWASP, and NIST 800-115

Three frameworks dominate professional penetration testing methodology, each with different emphases and use cases.

The Penetration Testing Execution Standard (PTES) is the most comprehensive methodology for general-purpose enterprise penetration testing. It covers intelligence gathering, threat modeling, exploitation, post-exploitation, and reporting in detail. PTES is practitioner-authored and reflects real-world attack progression rather than theoretical frameworks. Its weakness is limited maintenance since its initial publication.

The OWASP Web Security Testing Guide (WSTG) v4.2 is the authoritative methodology for web application penetration testing. It covers 91 distinct test cases across authentication, authorization, input validation, error handling, cryptography, and business logic. For application security teams, WSTG is the de facto standard. Its test case format provides clear pass/fail criteria that map to specific vulnerability classes and CWEs.

NIST SP 800-115 is the framework of record for federal and regulated industry penetration testing. It emphasizes documentation, authorization procedures, and risk assessment integration that compliance programs require. It is less technically detailed than PTES or WSTG but provides the process rigor that auditors expect.

For most enterprise engagements, practitioners use PTES as the overarching methodology, WSTG for web application components, and NIST 800-115 for compliance documentation requirements.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Toolchain by Phase

The quality of a penetration test is determined by the tester's skill in interpreting and chaining findings — not by the tools themselves. That said, specific tools have become standard for each phase because they produce reliable, reproducible results.

Reconnaissance: Maltego (relationship mapping and OSINT pivoting), Shodan and Censys (internet exposure intelligence), theHarvester (email and subdomain enumeration), Amass (comprehensive DNS reconnaissance and asset discovery).

Scanning and Enumeration: Nmap (network discovery and service identification, the universal baseline), Nessus or OpenVAS (vulnerability identification), Nikto (web server misconfiguration detection), Gobuster or FFuf (directory and endpoint brute-forcing), Nuclei (template-based vulnerability scanning with community-maintained check library).

Exploitation: Metasploit Framework (exploit development and delivery platform, post-exploitation modules), Burp Suite Professional (web application attack platform, the standard for application testing), CrackMapExec (Windows network exploitation and lateral movement), Impacket (Python library for Windows protocol exploitation, particularly useful for SMB and Kerberos attacks).

Post-Exploitation and Privilege Escalation: BloodHound (Active Directory attack path analysis), Mimikatz (credential extraction from Windows memory — requires appropriate authorization), PowerSploit and PowerView (PowerShell-based AD enumeration and exploitation), LinPEAS and WinPEAS (automated privilege escalation enumeration on Linux and Windows respectively), Cobalt Strike (commercial C2 framework for advanced adversary simulation — typically reserved for red team engagements rather than standard penetration tests).

Scoping Decisions That Determine Engagement Value

The scope of a penetration test determines what you learn from it. Poorly scoped engagements produce findings that do not reflect real attack risk; well-scoped engagements surface the vulnerabilities that matter most to your specific threat model.

Test type selection is the first scoping decision: black box (no prior knowledge, simulating an external attacker with no insider information), grey box (partial knowledge such as network diagrams, application documentation, or a standard user account), or white box (full knowledge including source code, architecture diagrams, and internal credentials). Black box tests are most realistic for external attacker simulation but are the least time-efficient — testers spend significant time on reconnaissance that defenders could shortcut. White box tests are most comprehensive for finding all vulnerabilities in an application. Grey box tests balance realism with efficiency for most enterprise engagements.

Scope exclusions are as important as inclusions. Systems that cannot tolerate testing (production databases, OT/ICS equipment, real-time safety systems) should be explicitly excluded with documented justification. Vague exclusions like 'do not cause outages' are not testable and create liability ambiguity — specify exactly which systems are excluded and why.

Authentication scope determines whether the test includes pre-authentication attack surface (what can an unauthenticated attacker do?) and/or post-authentication testing (given a legitimate user account, what can they access that they should not?). Both should be included in comprehensive application security assessments.

What Separates a Useful Pentest Report from a Useless One

The penetration test report is the output that provides lasting value, and report quality varies enormously across testing providers. Before committing to a firm, request a sample report from a previous engagement (redacted for client confidentiality) and evaluate it against these criteria.

Findings should include: vulnerability title, affected system or component, CVSS base score with the justification for each metric, step-by-step reproduction instructions with screenshots or tool output as evidence, business impact narrative explaining what an attacker could accomplish using the finding, and specific remediation guidance with the exact configuration change or patch required.

Findings that lack reproduction evidence are not findings — they are hypotheses. A vulnerability listed without a screenshot showing successful exploitation, or a tool output demonstrating exploitability, cannot be independently verified and should not receive the same priority as evidenced findings.

The executive summary should communicate business risk, not technical findings. A summary that leads with CVE numbers and CVSS scores is not written for executive decision-makers. The summary should explain what the engagement's most significant findings mean for the organization in business terms: what data could be accessed, what operations could be disrupted, and what the priority remediation actions are.

Remediation guidance must be specific and actionable. 'Update all software to the latest version' is not remediation guidance. The specific package, the specific configuration change, or the specific architectural mitigation — with a reference to the vendor's fix documentation — is remediation guidance.

The bottom line

A penetration test is a snapshot of your security posture at a specific point in time from a specific attacker perspective. Its value is proportional to the quality of the scope definition, the methodology applied, the skill of the testers, and the accuracy and clarity of the report. Invest in the scoping process before the engagement begins: define the test type, authorization boundaries, out-of-scope systems, and success criteria explicitly. Evaluate providers on sample report quality and tester credentials (OSCP, CRTE, CRTO, or demonstrable CTF/research output) rather than on price alone. Remediate findings against the SLAs you set — a pentest that produces a report that sits unread provides zero security value.

Frequently asked questions

What is the difference between a penetration test and a vulnerability assessment?

A vulnerability assessment identifies and lists vulnerabilities present in a system, typically through automated scanning tools. It answers: what vulnerabilities exist? A penetration test goes further by attempting to exploit those vulnerabilities to demonstrate real-world impact. It answers: what can an attacker actually accomplish? A vulnerability assessment produces a list of findings; a penetration test produces evidence-backed attack paths with demonstrated business impact. Both are valuable, but they serve different purposes — vulnerability assessments are continuous hygiene; penetration tests validate whether your defenses hold against an active attacker.

How often should an organization conduct penetration testing?

At minimum, annual penetration testing of critical systems and external-facing applications is the baseline for most compliance frameworks (PCI DSS requires annual pentest and after significant changes). Organizations with higher risk profiles — financial services, healthcare, critical infrastructure — should conduct external network pentesting quarterly, application testing with each major release, and a full internal network test at least semi-annually. Continuous automated testing (DAST tools, bug bounty programs) supplements but does not replace periodic manual penetration testing, which finds logic flaws and attack chain combinations that automated tools miss.

What is the difference between penetration testing and red teaming?

Penetration testing comprehensively assesses a defined scope for all exploitable vulnerabilities. Red teaming simulates a specific adversary attempting to achieve a specific objective (reach the domain controller, exfiltrate financial records) using the tactics, techniques, and procedures of that adversary type — without testing every vulnerability. A penetration test aims for breadth; a red team engagement aims for depth and realism. Red team engagements are typically longer (weeks to months), more expensive, and require greater security program maturity to extract value from — the blue team needs to have deployed detections worth testing. Most organizations should be running annual penetration tests before considering red team engagements.

What credentials should I look for when hiring a penetration tester?

For network and infrastructure testing, OSCP (Offensive Security Certified Professional) is the baseline credential that demonstrates hands-on exploitation competency. For Active Directory and red team engagements, CRTE (Certified Red Team Expert) and CRTO (Certified Red Team Operator) indicate advanced Windows attack technique proficiency. For web application testing, BSCP (Burp Suite Certified Practitioner) and OSWE demonstrate application attack depth. Beyond certifications, review the tester's public research, CVE discoveries, CTF achievements, and references from previous clients. Credentials without demonstrated practical output are insufficient for evaluating tester quality.

How do I ensure the penetration test does not disrupt production systems?

Preventing production impact requires explicit pre-test agreements and technical safeguards. Define prohibited techniques in the rules of engagement: no denial-of-service testing, no exploitation of vulnerabilities that could cause data corruption, no testing during specified maintenance windows. Provide an emergency stop contact — a named individual who can immediately halt testing if an incident occurs. For particularly sensitive systems, conduct testing in a staging environment first and limit production testing to low-impact reconnaissance and vulnerability identification without exploitation. Establish communication protocols for near-real-time coordination between the testing team and your incident response team during the engagement.

What should I do after receiving a penetration test report?

Treat the report as a remediation project, not a compliance deliverable. Immediately triage findings by severity and assign each to an owner with a remediation deadline. Critical findings (CVSS 9.0 and above, or any finding with a demonstrated critical business impact) should be remediated within 24 to 72 hours or mitigated with compensating controls while remediation is underway. Schedule a retesting engagement — either with the same firm or internally — to verify that critical and high findings are genuinely remediated rather than just closed administratively. Track remediation completion rate and mean time to remediate as program metrics reported to leadership.

Can a penetration test miss vulnerabilities that matter?

Yes. Penetration tests are time-bounded and scope-limited. A tester working within a two-week engagement window will not exhaustively test every endpoint, every authentication flow, and every business logic path. Scope exclusions may leave significant attack surface untested. Black box tests spend significant time on reconnaissance that a real attacker with insider knowledge would skip, reducing time available for deep exploitation. Supplement penetration testing with continuous automated scanning (DAST, SAST), bug bounty programs for broad external coverage, and threat modeling to identify attack surfaces that automated tools and time-limited testers are most likely to miss.

Sources & references

  1. PTES — Penetration Testing Execution Standard
  2. OWASP — Web Security Testing Guide v4.2
  3. NIST SP 800-115 — Technical Guide to Information Security Testing
  4. MITRE ATT&CK — Enterprise Framework

Free resources

25
Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

No spam. Unsubscribe anytime.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

No spam. Unsubscribe anytime.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Eric Bang
Author

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.

Joins Decryption Digest. Unsubscribe anytime.

Daily Briefing

Get briefings like this every morning

Actionable threat intelligence for working practitioners. Free. No spam. Trusted by 50,000+ SOC analysts, CISOs, and security engineers.

Unsubscribe anytime.

Mythos Brief

Anthropic's AI finds zero-days your scanners miss.