PRACTITIONER GUIDE | OFFENSIVE SECURITY
Practitioner Guide13 min read

Purple Team Exercise Methodology: How to Run Collaborative Red-Blue Simulations That Actually Improve Detection

3x more likely
Organizations that identify detection gaps through purple team exercises vs. post-incident review
Over 600
ATT&CK techniques in the current framework (v15)
35-45%
Average detection coverage before a first purple team exercise in mature SOC environments
Under 5 minutes
Time from technique execution to SOC alert in well-tuned environments

A purple team exercise directly tests and improves your detection engineering by executing real attacker techniques in your environment with blue team visibility. The red team acts as an operator running specific MITRE ATT&CK techniques while the blue team watches logs and SIEM dashboards, observes what generates alerts, identifies what does not, and tunes detection logic in real time. The output is a measured increase in detection coverage — not a report of what the red team compromised.

This methodology is distinct from a traditional red team engagement (which tests the red team against live defenses in stealth mode) and from tabletop exercises (which do not execute anything). Purple teaming is the fastest path to improving detection capability against a specific threat actor profile.

Planning the Exercise: Scope, Threat Actor Selection, and ATT&CK Mapping

A purple team exercise without a scoped threat actor profile devolves into a generic technique checklist that produces unfocused results. The most valuable exercises are threat-actor-specific: select the adversary most likely to target your organization and test your detection coverage against their documented TTPs.

Step 1: Select the threat actor profile. Use threat intelligence to identify the 1-3 threat actors most likely to target your sector, geography, and technology stack. Consult:

  • MITRE ATT&CK Groups page: each documented group has ATT&CK technique mappings
  • Your threat intelligence platform's sector-specific reporting
  • CISA alerts for your sector
  • ISACs (Information Sharing and Analysis Centers) for your industry

Step 2: Extract ATT&CK techniques for the selected actor. For each selected threat actor, export their documented technique list from ATT&CK Navigator. Filter by: techniques with documented procedures (meaning specific implementation details are available, not just theoretical coverage), techniques relevant to your environment (discard cloud-specific techniques if the actor targets on-premises), and techniques where your current detection coverage is unknown or low.

Step 3: Prioritize by detection value. Not all techniques are equally worth testing. Prioritize techniques that:

  • Appear in the initial access, execution, and persistence phases (catching early-stage activity stops the intrusion)
  • Are used by multiple threat actors against your sector (technique diversity in the checklist)
  • Map to existing detection rules you want to validate (confirming working detections is as valuable as finding gaps)
  • Your SOC team has no visibility data for (unknown coverage is highest priority)

Step 4: Map to atomic test cases. For each prioritized technique, identify executable test cases from the Atomic Red Team library (github.com/redcanaryco/atomic-red-team). Atomic Red Team provides specific commands, scripts, and tools for executing each ATT&CK technique. Each test case documents expected logs and artifacts for validation.

Tooling: Atomic Red Team, CALDERA, and Commercial Platforms

Selecting the right execution platform determines the range of techniques you can test and the level of documentation generated during the exercise.

Atomic Red Team (open source, Red Canary)

A library of over 1,000 small, single-purpose tests mapped to ATT&CK techniques. Each 'atomic test' executes one specific implementation of a technique: a PowerShell one-liner that creates a scheduled task, a bash command that modifies cron, a .NET assembly that performs LSASS access. Best for: targeted technique validation, blue team training, and exercises where you want maximum control over exactly what executes. Run via Invoke-AtomicRedTeam PowerShell module. Excellent documentation of expected artifacts per test.

MITRE CALDERA (open source, MITRE)

A full adversary emulation platform with an autonomous agent (Sandcat) that runs on compromised endpoints and executes technique chains. CALDERA supports multi-step adversary emulation plans (comparable to threat actor playbooks), automated chaining of techniques based on discovered environment facts, and real-time operation monitoring. Best for: full intrusion simulation covering multiple phases, testing lateral movement and persistence chains, and validating detection across an intrusion lifecycle rather than individual techniques.

Prelude Operator (open source)

An adversary simulation platform with a GUI-based operation planner, support for multiple agent types, and TTX (tabletop exercise) integration. Prelude Community Edition is free; Pro adds managed content packs and team collaboration features. Best for: teams wanting a polished UI for managing exercise operations and tracking technique execution status.

Commercial platforms: AttackIQ, SafeBreach, XM Cyber

Commercial breach and attack simulation (BAS) platforms execute technique libraries on a continuous or scheduled basis and produce coverage reports mapped to ATT&CK. AttackIQ integrates with MITRE's ATT&CK Evaluations methodology. SafeBreach includes an extensive scenario library and automated remediation tracking. XM Cyber focuses on attack path simulation across cloud and on-premises environments. Best for: organizations wanting continuous validation at scale rather than point-in-time exercises.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Running the Exercise: The Execution Loop

The core purple team execution loop is: execute one technique, observe detection outcome, document result, tune if needed, move to next technique. This loop is repeated for each technique in the exercise scope.

Pre-execution checklist:

  • All exercise systems identified and tagged in SIEM (prevents analysts from escalating exercise activity as a real incident)
  • Change freeze in effect for exercise environment during the exercise window
  • Blue team has SIEM access and is actively monitoring
  • Snapshot or clean baseline taken for exercise machines (enables reset after destructive techniques)
  • Communication channel established between red and blue teams (Slack channel or call bridge)

Execution sequence for each technique:

  1. Red team pre-announces: Red team notifies blue team: "We are about to execute T1059.001 (PowerShell) at 10:15 on HOST-01 using the following command: [exact command]." Pre-announcing eliminates ambiguity about what generated which alert.
  2. Execute: Red team executes the technique at the agreed time.
  3. Blue team observes (5-minute window): Blue team notes whether an alert fired, which rule triggered it, how quickly it fired, and the alert quality (true positive, tuning needed, or false positive).
  4. Document outcome: Record the technique ID, test case used, timestamp, expected artifacts, observed artifacts, alert fired (yes/no), alert quality (high/medium/low), and detection gap (if no alert).
  5. Tune if time allows: If a detection gap is identified and a fix is obvious, blue team tunes the rule in real time and re-executes the technique to validate the fix.

Exercise tracking template fields:

Technique ID | Technique Name | Test Case | Timestamp |
Artifacts Expected | Artifacts Observed | Alert Fired | Rule Name |
Alert Quality | Gap Category | Remediation Action | Owner | Due Date

Detection Gap Categories and Remediation Actions

Detection gaps identified during the exercise fall into distinct categories, each with a specific remediation approach.

Category 1: Telemetry gap — No log data exists for this activity. The technique executed but produced no logs in the SIEM. Remediation: deploy additional telemetry collection (Sysmon, EDR policy adjustment, Windows audit policy change, network sensor coverage expansion). Without the underlying log data, writing detection rules is impossible.

Example: PowerShell Script Block Logging is disabled — T1059.001 execution leaves no Script Block event in the SIEM. Fix: Enable PowerShell Script Block Logging (Event ID 4104) and Module Logging (Event ID 4103) via Group Policy.

Category 2: Ingestion gap — Log data is generated but not reaching the SIEM. The source system generates the expected log (visible in local Event Viewer or endpoint agent) but it is not appearing in the SIEM. Remediation: fix log forwarding configuration, WEC subscription, or SIEM ingestion pipeline.

Category 3: Parsing gap — Log data is ingested but not correctly parsed. Raw log data arrives in the SIEM but fields are not extracted correctly, preventing field-based detection rules from matching. Remediation: fix or add SIEM parsing logic for the log source.

Category 4: Rule gap — Telemetry exists and is parsed but no detection rule covers it. All log data is available and searchable, but no detection rule exists for this technique. Remediation: write and deploy a new detection rule. Use SigmaHQ community rules as a starting point and adapt for your environment.

Category 5: Tuning gap — Rule exists but does not fire, or fires with too many false positives. A detection rule exists but is either suppressed due to false positive volume or has a logic error. Remediation: tune the rule logic, reduce suppression scope, or split the rule into a high-confidence variant and a lower-confidence informational variant.

Category 6: Analyst gap — Alert fires but is suppressed or not investigated. Alerts are generated but are buried in alert queue or auto-closed by triage automation. Remediation: adjust alert priority, SOC triage procedures, or automation suppression logic.

Measuring and Reporting Exercise Outcomes

The primary output of a purple team exercise is a detection coverage delta — the measurable improvement in detection capability. Report these metrics:

Coverage metrics:

  • Techniques tested: total count of ATT&CK techniques executed
  • Detection rate (pre-exercise): percentage of techniques that generated a timely, accurate alert before any tuning during the exercise
  • Detection rate (post-exercise): percentage detected after real-time tuning during the exercise
  • Detection rate (30-day target): percentage committed to be detectable within 30 days, including fixes requiring telemetry deployment

Quality metrics:

  • Mean time to detect (MTTD): average time from technique execution to alert generation
  • Alert accuracy rate: percentage of fired alerts that were true positives vs. required tuning
  • Telemetry gap count: number of techniques with no underlying log data (measures logging maturity, not detection maturity)

Gap distribution by ATT&CK tactic: Break down gaps by tactic (Initial Access, Execution, Persistence, Privilege Escalation, etc.) to identify which phases of the kill chain have the weakest detection coverage. Most organizations find execution and persistence have better coverage than discovery and lateral movement.

Trend tracking across exercises: Run exercises quarterly and track detection rate trends over time. The goal is a consistent upward trend. Flat or declining detection rate despite investment signals that the environment is growing faster than detection coverage — a scaling problem requiring architectural response.

The bottom line

Purple team exercises are the most direct path to measurable detection improvement because they replace assumption with measurement. Before your first exercise, you may assume 60% detection coverage; after it, you have evidence-based data showing which specific techniques generate alerts, which do not, and why. Schedule exercises quarterly with rotating threat actor profiles, use Atomic Red Team for technique-level validation, and track coverage metrics as KPIs for the detection engineering function. The investment is modest relative to a red team engagement, and the detection improvement is immediate and documented.

Frequently asked questions

What is the difference between a purple team exercise and a red team engagement?

A red team engagement tests the effectiveness of your defenses by having a skilled adversary attempt to achieve objectives (reach a target system, exfiltrate data) while evading detection. The outcome measures how far the red team gets. A purple team exercise is collaborative: the red team announces what they are executing and when, and the blue team observes whether their controls detect it. The outcome measures detection coverage, not red team success. Red team exercises test blue team capability under realistic conditions; purple team exercises improve that capability systematically.

How long should a purple team exercise take?

A focused single-threat-actor purple team exercise covering 20-30 ATT&CK techniques takes 2-3 days: one day for planning and environment preparation, one to two days for technique execution and real-time tuning. Broader exercises covering 50+ techniques or multiple threat actor profiles extend to a week. Most organizations find that diminishing returns set in after 2-3 days of continuous execution because blue team analysts fatigue and real-time tuning quality degrades. Schedule multiple shorter exercises quarterly rather than one exhaustive annual exercise.

Do I need a dedicated red team to run purple team exercises?

No. Internal purple team exercises can be run by a detection engineer or SOC analyst with security testing knowledge using Atomic Red Team. The person running Atomic Red Team tests does not need red team expertise — they are executing documented test cases, not performing creative offensive research. The value comes from the blue team's observation and tuning, not from the sophistication of the attacker. External red teams add value for more complex multi-stage simulations (CALDERA-based) or for exercises where the blue team should not know the exact playbook in advance.

How do I prevent purple team exercise activity from being escalated as a real incident?

Tag exercise machines in your CMDB and SIEM before the exercise begins. Create a suppression rule or label in your SIEM that marks all activity from exercise source IPs and hostnames as 'PT-EXERCISE-[date]' rather than suppressing alerts entirely — you want to see the alerts, just not have them escalate to incident. Brief your SOC team and Tier 2/3 analysts before the exercise window. If you use a managed SOC or MSSP, notify them via a change ticket with the exact exercise time window, source hosts, and technique categories.

Which ATT&CK techniques should I test first if I have never done a purple team exercise?

Start with high-frequency techniques that appear in most threat actor TTPs: T1059.001 (PowerShell execution), T1053.005 (scheduled task creation for persistence), T1003.001 (LSASS memory access for credential dumping), T1021.002 (SMB/Windows Admin Shares for lateral movement), T1562.001 (disabling security tools), and T1070.001 (clearing Windows Event Logs). These six techniques cover the most common detection gaps and provide immediate value regardless of which specific threat actor you are worried about.

How do I measure detection coverage improvement after a purple team exercise?

Calculate detection rate as: techniques generating a timely accurate alert divided by total techniques executed. Run this calculation both before any real-time tuning (pre-exercise baseline) and immediately after the exercise concludes (post-exercise rate). Then measure again 30 days later after all committed remediation actions are completed (30-day target rate). A successful quarterly purple team program should show 5-15 percentage points of coverage improvement per exercise cycle, eventually plateauing near 70-80% detection rate as you reach techniques that require additional telemetry infrastructure investment.

Sources & references

  1. MITRE ATT&CK Framework
  2. Atomic Red Team by Red Canary
  3. MITRE CALDERA Adversary Emulation Platform
  4. CISA Purple Teaming Guidance
  5. Red Canary 2026 Threat Detection Report

Free resources

25
Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

No spam. Unsubscribe anytime.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

No spam. Unsubscribe anytime.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Eric Bang
Author

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.

Joins Decryption Digest. Unsubscribe anytime.

Daily Briefing

Get briefings like this every morning

Actionable threat intelligence for working practitioners. Free. No spam. Trusted by 50,000+ SOC analysts, CISOs, and security engineers.

Unsubscribe anytime.

Mythos Brief

Anthropic's AI finds zero-days your scanners miss.