Practitioner GuideMay 20, 202613 min read

DevSecOps Metrics and KPIs: What to Measure, What to Report, and What to Ignore

Sources:DORA: Accelerate State of DevOps Report|OWASP: Application Security Verification Standard|NIST: Measuring and Improving Cyber Defense Capabilities|Google: DORA Metrics for Software Delivery Performance

Eric Bang

Founder & Cybersecurity Evangelist

60%

Of security findings go unremediated after 90 days (Veracode State of Software Security 2025)

MTTR

Mean Time to Remediate -- the single most actionable DevSecOps metric

4.5x

More likely to meet reliability targets with elite DORA performance (Google DORA 2025)

False positive rate

The leading cause of developer scanner fatigue and finding dismissal

Most DevSecOps programs measure the wrong things. Vulnerability count, scan coverage percentage, and finding severity distribution are all reported upward to leadership as evidence of program health -- but none of them measure whether security is actually improving. A program that finds 10,000 vulnerabilities and remediates 2,000 is worse than a program that finds 500 and remediates 490. A scanner with a 40% false positive rate wastes developer time and trains engineers to dismiss security findings. The metrics that actually reflect DevSecOps program health are measurement of remediation speed, false positive rate, and whether new code being deployed has lower defect density than older code. This guide covers what to measure, how to present it, and which common metrics to stop reporting.

The Vanity Metrics Trap: What Not to Report

Before covering what to measure, clarify what to stop measuring and reporting as program health indicators.

Total vulnerability count: Reporting "we found 15,000 vulnerabilities this quarter" tells leadership nothing actionable. Finding count is a function of how many applications you scanned, how many scanners you deployed, and how sensitive your thresholds are -- not of how secure your software is. Organizations that expand scanner coverage will see finding counts increase even if underlying code quality improves.

Scan coverage percentage: "85% of repositories are scanned" sounds like progress but does not indicate whether the scanning is finding real issues, whether developers are remediating findings, or whether the scan configuration is appropriate for each application type.

Critical finding count in isolation: A single critical finding that is fixed in 2 days is better than 100 medium findings that sit unaddressed for 6 months. Severity distribution without remediation rate attached to it is an incomplete picture.

Tool adoption rate: "All teams are using SAST" is not a security outcome. A SAST deployment where developers have learned to dismiss all findings because the false positive rate is 60% provides no security value regardless of the adoption percentage.

The metrics that matter instead:

Remediation rate and MTTR (Mean Time to Remediate) by severity
False positive rate per scanner
Security defect density in new code vs. existing code (is the program preventing new defects?)
Escape rate (vulnerabilities found in production that were not caught in the pipeline)
Developer engagement rate (what percentage of assigned security findings are acted on vs. dismissed or ignored)

DORA Metrics and Their Security Implications

DORA (DevOps Research and Assessment) defines four key metrics for software delivery performance. Each has direct security implications that DevSecOps programs should monitor.

Deployment Frequency: How often code is deployed to production. High-performing teams deploy multiple times per day. Security implication: high deployment frequency means security scanning must be fast (under 10 minutes for blocking checks) or it becomes a bottleneck that pressure developers to bypass. If your SAST scan takes 45 minutes, high-frequency deployment teams will disable or bypass it.

Lead Time for Changes: Time from code commit to production deployment. Security implication: if security scanning adds significant lead time, it creates pressure to reduce or eliminate security steps. Measure the security scanning contribution to lead time and target getting security scanning below 5% of total lead time.

Change Failure Rate: Percentage of deployments that cause production failure (requiring rollback or hotfix). Security implication: security-related deployment failures (a security misconfiguration causes an outage, a dependency upgrade breaks an interface) should be tracked separately to understand the security contribution to change failure rate.

Mean Time to Recovery (MTTR for incidents): Time to restore service after a production failure. Security implication: security incidents that cause production outages contribute to MTTR. Track security-caused incidents separately.

The security-specific extension to DORA: Add two security-specific metrics alongside the four DORA metrics:

Mean Time to Remediate (MTTR for vulnerabilities): Average time from vulnerability discovery to verified fix, segmented by severity (Critical: target less than 7 days; High: less than 30 days; Medium: less than 90 days).
Escape rate: Percentage of vulnerabilities discovered in production that should have been caught in the pipeline. High escape rate indicates either scanner coverage gaps or finding dismissal problems.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Mean Time to Remediate: The Most Actionable Security Metric

Mean Time to Remediate (MTTR) -- the time from a vulnerability being identified to a verified fix being deployed -- is the single most actionable DevSecOps metric. It answers the question that matters: when we find a problem, how quickly do we fix it?

MTTR by severity tier (industry benchmarks):

Severity	Target MTTR	Industry Average (Veracode 2025)
Critical	Less than 7 days	58 days
High	Less than 30 days	96 days
Medium	Less than 90 days	196 days
Low	Less than 180 days	291 days

The gap between target and industry average is significant. Most organizations are not close to their stated SLAs for vulnerability remediation. Measuring actual MTTR against SLA reveals the gap; the next step is root cause analysis.

Common MTTR failure modes:

No owner assigned: Findings without a clear responsible engineer sit in a queue indefinitely. Every finding must be assigned to a named owner with accountability.
Findings not visible in developer workflow: A security dashboard that developers never look at does not drive remediation. Findings must appear in the tools developers use daily -- Jira tickets created automatically, GitHub PR comments, IDE extensions.
No escalation process: Findings approaching SLA deadline with no remediation action should trigger automatic escalation. Without escalation, SLAs are aspirational rather than enforced.
High false positive rate: Developers who dismiss findings because 40% are false positives will also dismiss real findings. Measure and reduce false positive rate as a prerequisite to improving MTTR.

Tracking MTTR in practice: Most SAST and vulnerability management platforms (Snyk, Veracode, Checkmarx, GitHub Advanced Security) provide MTTR reporting natively. If your scanners do not, use the Jira ticket creation date (when the finding became a tracked task) as the start time and the ticket resolution date as the end time. This slightly underreports MTTR but is better than not tracking it.

Shift-Left Effectiveness: Measuring Whether Prevention Is Working

Shift-left security means finding and fixing vulnerabilities earlier in the development lifecycle, where remediation costs less and risk is lower. Measuring whether your shift-left investment is actually shifting findings left is critical to demonstrating program value.

Security defect density by code age: Compare the vulnerability density (findings per 1,000 lines of code, or per component) for code written in the past 6 months versus code written before shift-left controls were deployed. If shift-left is working, new code should have lower defect density than older code. If new code has the same or higher defect density, the shift-left investment (SAST, SCA, developer training) is not producing results.

Pipeline gate effectiveness: For each security control deployed as a pipeline gate (blocking deployment on critical findings), measure:

Findings caught by the gate per month (decreasing trend over time is good -- developers are fixing issues before they reach the gate)
False positive rate for the gate (high false positive rate means developers are learning to dismiss gate failures)
Gate bypass rate (how often is the gate disabled, bypassed, or its findings marked as accepted risk without proper review?)

Developer engagement metrics:

Percentage of assigned findings fixed vs. marked as accepted risk vs. dismissed
Average number of findings per developer (declining over time as developer security knowledge improves)
Time spent per finding (declining over time as fixes become routine for common finding types)

Security champion effectiveness: For programs with security champions (developers in each team with additional security training): compare defect density in teams with active security champions vs. teams without. Security champion programs that are not producing measurable defect density reduction should be revised.

Presenting shift-left effectiveness to engineering leadership: Frame it as developer productivity: "Security issues caught in code review take 30 minutes to fix. The same issue caught in production takes 8 hours of incident response plus 2 hours of emergency patch development. Our shift-left controls prevented 47 such incidents last quarter." Engineering leadership responds to developer time efficiency arguments more than abstract security improvement claims.

Security Debt: Tracking and Communicating the Backlog

Security debt is the accumulated backlog of known vulnerabilities that have not been remediated. Tracking security debt provides a more honest picture of security posture than finding counts, and communicating it effectively to leadership drives remediation investment.

Security debt calculation: Count open findings segmented by severity and age:

Severity	Age	Risk Weight
Critical	Over 30 days	10
High	Over 60 days	5
Medium	Over 180 days	2
Low	Over 365 days	1

Multiply open finding count by risk weight and sum for a total security debt score. Track this score monthly. A rising score indicates debt accumulation faster than remediation; a declining score indicates debt reduction progress.

The security debt aging report: Present open vulnerabilities in a table showing how long each finding has been open past its SLA deadline. A finding that is 200 days past its 30-day High severity SLA has fundamentally different risk than a finding that opened yesterday. The aging report communicates urgency in a way that finding counts alone do not.

Communicating security debt to leadership: Map security debt to business risk: "Our 14 unpatched critical vulnerabilities in internet-facing applications represent a risk of unauthorized data access affecting customer PII. Each day these remain unpatched increases the probability of exploitation based on known attack campaigns against these vulnerability types." This framing connects security debt to business risk rather than abstract compliance with SLAs.

Debt reduction planning: Security debt reduction requires dedicated engineering time -- it does not reduce itself. Negotiate a monthly allocation of engineering time (typically 10-15% for teams with significant security debt) explicitly reserved for security debt reduction. Without a dedicated allocation, feature development will consistently prioritize over security debt remediation.

The bottom line

DevSecOps metrics only have value if they drive decisions. Measuring MTTR and finding it is 200 days for High severity vulnerabilities should immediately trigger a conversation about owner assignment, escalation processes, and developer workflow integration. Measuring false positive rate and finding it is 55% should immediately trigger scanner configuration review and threshold adjustment. The metrics are not the goal -- the behavioral changes they reveal and drive are the goal. Start with three: MTTR by severity, false positive rate per scanner, and escape rate. These three metrics will surface the most important improvement opportunities in any DevSecOps program.

Frequently asked questions

How do I calculate false positive rate for a security scanner?

False positive rate (FPR) = (confirmed false positives / total findings) × 100. To measure: take a sample of 100-200 findings from a scanner over a defined period. Have a security engineer or developer review each finding and classify it as confirmed vulnerability, false positive, or accepted risk. Divide the false positive count by the total reviewed. For meaningful results, sample findings across multiple applications and severity levels. Most enterprise SAST scanners have FPRs between 20-50% without tuning; SCA (dependency scanners) have lower FPRs because vulnerability databases are more reliable than code analysis heuristics. Target a false positive rate below 15% for blocking pipeline gates.

What is the DORA elite performance benchmark for security?

DORA defines four performance tiers (Elite, High, Medium, Low) for the four core DevOps metrics. Elite performers deploy multiple times per day with a less than 1-hour lead time for changes, a change failure rate below 5%, and MTTR under 1 hour for production failures. From a security perspective, elite DORA performance is a prerequisite for effective security response: organizations that take weeks to deploy changes cannot patch critical vulnerabilities within 7 days. If your organization's deployment frequency is less than weekly, the security SLA for Critical vulnerabilities must account for the deployment constraint -- acknowledging that the patch may be ready before it can be deployed.

How do I present security metrics to engineering leadership vs. security leadership?

Engineering leadership cares about developer velocity impact and business risk. Present MTTR in terms of engineering time: 'Security findings caught in CI/CD take 45 minutes to fix; findings caught in production require 8 hours of incident response.' Present security debt in terms of business risk: 'Our 8 unpatched critical findings in the payment API represent direct financial fraud exposure.' Security leadership cares about risk posture trends and coverage. Present escape rate, MTTR trends over time, and coverage metrics. Both audiences respond better to trend data ('MTTR improved from 120 days to 45 days over 6 months') than point-in-time snapshots.

How do I establish MTTR SLAs that developers will actually meet?

Start with realistic baselines, not aspirational targets. Measure your current actual MTTR before setting SLAs. If your current MTTR for High severity is 120 days, setting a 30-day SLA immediately puts you at 100% SLA breach rate and signals to developers that SLAs are not meaningful. Instead, set initial SLAs at 20% faster than current actual performance and improve by 20% each quarter. Involve engineering leads in setting SLAs -- SLAs imposed without engineering buy-in are not enforced. Attach consequences to SLA breach (escalation to engineering manager, required weekly status update on overdue findings) and rewards for SLA compliance (security metrics included in team OKRs).

What tools provide DevSecOps metrics reporting?

Most enterprise application security platforms provide native metrics: Snyk provides MTTR reporting, vulnerability aging, and pipeline gate effectiveness. Veracode provides comprehensive metrics including policy compliance, MTTR, and developer engagement. GitHub Advanced Security provides code scanning metrics and dependency alert resolution time. For platforms that do not provide native metrics, export finding data to your SIEM or data warehouse and build dashboards in Splunk, Grafana, or Tableau. The OWASP DefectDojo open-source platform provides vulnerability management with MTTR tracking and finding lifecycle management, suitable for organizations aggregating findings from multiple scanners.

How do I measure whether developer security training is improving code quality?

Compare pre-training and post-training finding counts for the same developers on new code they write -- not total finding counts (which may increase if scanner coverage expands). Measure the specific CWE categories covered in training: if training covers SQL injection and XSS, track whether findings in those categories decline in code written after training. Track repeat findings: a developer who repeatedly introduces the same type of vulnerability is not retaining the training. Compare defect density (findings per thousand lines of code) for security-champion-enrolled developers vs. non-enrolled developers as a measure of security champion program effectiveness. Six months is the minimum evaluation period for meaningful before/after comparison.

Sources & references

Free resources

Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Author

Eric BangCISSP

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

View profile →LinkedIn

Back to all briefings

Subscribe for Updates

DevSecOps Security Metrics KPIs DORA Metrics SDLC Security Application Security Security Program

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.