Patch Management SLAs, Automation, and Cadence: Making the Operational Model Work
Vulnerability management programs identify what needs to be patched. The patch management program is the operational system that actually patches it — consistently, at scale, without breaking production, and within SLAs short enough to matter. These two functions are often owned by different teams, which is where the gap lives. This guide covers the operational side: how to define realistic SLAs, build patch pipelines with testing stages, select and configure automation tooling, and measure whether the program is actually closing the window of exposure before attackers exploit it.
SLA Definition by Severity and Asset Criticality
Patch SLAs define the maximum time from patch availability to deployment, differentiated by vulnerability severity and asset criticality. Flat SLAs (e.g., 'all critical patches within 30 days') do not reflect actual risk — a critical patch for an internet-exposed server is not equivalent to a critical patch for an air-gapped lab system.
Recommended SLA matrix:
| Severity | Internet-Exposed / Tier 1 Assets | Internal / Tier 2 Assets | Non-Critical / Tier 3 Assets |
|---|---|---|---|
| Critical (CVSS 9.0+) | 24-48 hours | 7 days | 30 days |
| High (CVSS 7.0-8.9) | 7 days | 14 days | 30 days |
| Medium (CVSS 4.0-6.9) | 30 days | 45 days | 90 days |
| Low (CVSS < 4.0) | 90 days | 90 days | Next scheduled maintenance |
Adjust SLAs for CISA KEV (Known Exploited Vulnerabilities): The CISA KEV catalog lists vulnerabilities actively exploited in the wild. Any KEV-listed vulnerability should trigger an emergency patch track regardless of base CVSS score — some KEV entries score as Medium but are being actively exploited for ransomware deployment.
Emergency patch track SLA: 24-48 hours for all Tier 1 assets, 72 hours for all other assets.
SLA exceptions process: Some patches cannot be deployed within SLA — compensating controls requirements, application compatibility issues, change freeze periods. Every exception needs: a documented business reason, an alternative compensating control (WAF rule, network isolation, monitoring alert), an exception expiration date, and security team approval. Exceptions without expiration dates accumulate into permanent technical debt.
Patch Pipeline: Stages from Release to Production
Deploying patches directly to production is a reliability risk. Deploying them weeks later is a security risk. The patch pipeline balances both through staged deployment with defined dwell times at each stage.
Standard patch pipeline stages:
Stage 1: Patch acquisition and assessment (Day 0-1) Automated download from vendor sources. Initial triage: does this patch affect assets in your environment? Map to your asset inventory. Assign to emergency, standard, or low-priority track based on severity and KEV status.
Stage 2: Lab / test environment (Day 1-3 for standard patches) Deploy to a representative test environment. Run application smoke tests. Check for known compatibility issues (vendor KB articles, vendor-published patch notes). For Windows patches, validate with a small sample of each OS version and application configuration in scope.
Stage 3: Pilot group (Day 3-7 for standard patches) Deploy to 5-10% of production assets, selected to represent the diversity of your environment (different OS versions, application configurations, network segments). Monitor for 48-72 hours: application errors, service failures, user reports.
Stage 4: General deployment (Day 7-14 for standard critical patches) Rolling deployment to remaining assets within SLA. For Windows environments, stagger deployment by site or OU to limit blast radius if a bad patch causes issues.
Stage 5: Verification and closure Scan assets post-patch to confirm the CVE is no longer present. Do not rely solely on deployment confirmation from the patching tool — scan verification closes the loop. Update vulnerability management records.
Emergency patch track (KEV / active exploitation): Compress stages: deploy to lab and pilot simultaneously (Day 0-1), begin general deployment by Day 2. Accept higher risk of compatibility issues in exchange for faster exposure reduction. Have rollback procedures pre-documented before starting.
Briefings like this, every morning before 9am.
Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.
Tooling: WSUS, Ivanti, BigFix, and Ansible
Patch management tooling falls into two categories: purpose-built patch management platforms and general automation tools adapted for patching.
Microsoft WSUS (Windows Server Update Services): Free, built into Windows Server. Manages Windows OS and Microsoft product patches. Lacks coverage for third-party applications, Linux, or macOS. Suitable for Windows-only environments on a limited budget. Common issues: WSUS databases require regular maintenance; reporting is limited; no automated deployment workflows without additional scripting or SCCM/Intune layered on top.
Microsoft Endpoint Configuration Manager (MECM/SCCM) / Intune: For Windows-centric enterprises, SCCM provides sophisticated patch deployment with software update groups, deployment rings, and compliance reporting. Intune extends management to cloud-joined and BYOD devices. Requires Microsoft licensing.
Ivanti Neurons for Patch Management: Multi-platform (Windows, Linux, macOS, third-party applications). Risk-based prioritization that integrates CVSS, EPSS, and threat intelligence. Automated patch testing workflows. Strong third-party application coverage (Chrome, Firefox, Java, Adobe products). Good choice for organizations with mixed OS environments and a need for third-party application patching at scale.
HCL BigFix: Agent-based patch management with strong support for heterogeneous environments (Windows, Linux, Unix, macOS). Near-real-time visibility and fast deployment speeds. Frequently used in large enterprise and regulated industry environments. Steeper learning curve than Ivanti.
Ansible / Chef / Puppet (infrastructure automation):
Not purpose-built for patch management, but effective for Linux server patching at scale. Ansible playbooks for yum update or apt upgrade with defined package scope provide reliable, auditable patching without per-device agents. Integrate with your change management system via API for automated ticket creation and closure.
Third-party application patching: This is the most common gap. Browser plugins, Java, PDF readers, and productivity software are frequent exploit targets but are outside the scope of OS-focused tools. Ivanti, ManageEngine, and PDQ Deploy cover third-party application patching for Windows endpoints.
Linux Server Patching: Practical Patterns
Linux server patching is often less mature than Windows patching because there is no equivalent to WSUS or SCCM — teams build their own processes using package managers, automation tools, and scheduled maintenance windows.
Ansible-based patching approach:
- name: Patch Linux servers
hosts: linux_servers
become: yes
tasks:
- name: Update all packages (RHEL/CentOS)
yum:
name: '*'
state: latest
security: yes # security-only patches
when: ansible_os_family == "RedHat"
- name: Update all packages (Debian/Ubuntu)
apt:
upgrade: safe # safe-upgrade avoids removing packages
update_cache: yes
when: ansible_os_family == "Debian"
- name: Check if reboot is required (RHEL)
command: needs-restarting -r
register: reboot_required
failed_when: false
changed_when: false
- name: Reboot if required
reboot:
msg: "Reboot required after patching"
reboot_timeout: 300
when: reboot_required.rc == 1
Reboot coordination: Kernel patches require a reboot. In production environments, coordinate reboots during maintenance windows. Use rolling reboots for clustered services — never reboot all nodes simultaneously.
Live patching (kpatch, livepatch): For critical kernel patches, live patching tools apply patches without a reboot. Available from RHEL (kpatch), Ubuntu (Canonical Livepatch), and SUSE. Suitable for reducing downtime on high-availability systems; not a substitute for scheduled maintenance window patching.
Handling Patch Failures and Rollback
Every patch management program needs a defined procedure for what happens when a patch breaks something. Without it, teams either roll back inconsistently or leave broken systems in limbo while the vulnerability stays open.
Rollback decision criteria:
- Define in advance which failure types trigger immediate rollback vs. investigation
- Automatic rollback triggers: application fails to start post-patch, critical service outage, defined number of user-reported errors within 1 hour of deployment
- Investigation-first (no immediate rollback): intermittent errors, performance degradation below defined thresholds, single-system failures in a broad deployment
Windows rollback:
- Windows Update:
wusa.exe /uninstall /kb:XXXXXXX /quiet /norestart - SCCM: Create a deployment to uninstall the software update
- Pre-patch snapshots (VMware, Hyper-V): Revert snapshot if within the snapshot retention window (typically 24-72 hours)
Linux rollback:
# RHEL - list installed packages by date
yum history list
yum history undo [transaction_id]
# Ubuntu/Debian
apt-get install package=version # install specific previous version
Snapshot discipline: Take VM snapshots before patching in test and pilot stages. Delete snapshots after the pilot period passes without incident — old snapshots consume storage and degrade VM performance over time. Do not rely on snapshots as your only rollback mechanism for production.
Metrics That Prove the Program Is Working
Patch management metrics should answer one business question: are we closing the window of exposure faster than attackers can exploit it?
Core metrics:
SLA compliance rate by severity tier: What percentage of patches are deployed within defined SLA? Track separately for Critical, High, and Medium. Target: 95%+ for Critical/High, 85%+ for Medium. SLA compliance below 80% for critical patches indicates a broken process that needs immediate attention.
Mean Time to Patch (MTTP) by severity: Average calendar days from patch release to deployment across all assets. Compare monthly. Downward trend indicates improvement; upward trend indicates backlog accumulation.
Patch coverage rate: What percentage of in-scope assets are being patched? Blind spots in asset inventory mean assets that are never reached by patching tools. Identify assets not patched in 90 days as an active risk.
Exception count and age: How many active patch exceptions exist? What is the average age of exceptions? A growing exception backlog with old average ages indicates the exception process is becoming a backdoor for avoiding patching.
KEV remediation rate: For CISA KEV-listed vulnerabilities, what percentage are patched within the SLA? Report this metric to leadership — it directly represents exploitable risk.
Present to leadership: SLA compliance rate and KEV remediation rate as the two headline metrics. Avoid presenting raw vulnerability counts, which are meaningless without context — a high count on a fully patched system is not a risk.
The bottom line
Patch management is the operational execution layer of your vulnerability management program — and the layer where most organizations have the largest gap between policy and practice. Define SLAs that account for asset criticality, not just CVSS score. Build a staged pipeline that protects production without creating multi-week delays. Automate what can be automated, treat KEV listings as emergency patches, and measure SLA compliance and MTTP rather than patch counts. The goal is to consistently close the exploitation window faster than attackers can move through it.
Frequently asked questions
What is the difference between patch management and vulnerability management?
Vulnerability management identifies which vulnerabilities exist in your environment and prioritizes which to fix. Patch management is the operational process of deploying patches to remediate those vulnerabilities. Vulnerability management answers 'what needs to be fixed and in what order.' Patch management answers 'how do we fix it at scale, without breaking production, within defined SLAs.' Both functions are necessary; many organizations invest in vulnerability scanning without building the operational patch deployment capability to act on findings.
What patch management SLAs should organizations use?
SLAs should be tiered by both vulnerability severity and asset criticality. A common framework: Critical vulnerabilities on internet-exposed assets — 24-48 hours; Critical on internal assets — 7 days; High on internet-exposed — 7 days; High on internal — 14 days. Vulnerabilities listed in the CISA Known Exploited Vulnerabilities catalog should always trigger an emergency patch track regardless of CVSS score, as they are being actively exploited in the wild.
What tools are best for patch management at enterprise scale?
Windows-centric environments: Microsoft SCCM/Intune for OS patching plus Ivanti or ManageEngine for third-party applications. Mixed Windows/Linux environments: Ivanti Neurons or HCL BigFix provide multi-platform coverage. Linux-heavy environments: Ansible playbooks using yum/apt with defined package scope provide reliable, auditable patching without per-device agents. The most common tooling gap is third-party application patching (browsers, Java, PDF readers) — dedicated tools for this category are underused.
How should organizations handle patches that break production applications?
Define rollback criteria in advance: which failure types trigger immediate rollback vs. investigation-first. Maintain VM snapshots through the pilot phase for fast revert. For Windows, use `wusa.exe /uninstall` or SCCM deployment rollback. For Linux, use `yum history undo` or `apt-get install package=version`. Document every rollback, identify the root cause (application compatibility issue, patch defect), and track the vulnerability status — a rolled-back patch still represents open risk that requires a compensating control.
What is the CISA Known Exploited Vulnerabilities catalog and why does it matter for patching?
The CISA KEV catalog is a list of CVEs that CISA has confirmed are being actively exploited by real threat actors. It matters for patch management because CVSS score alone is a poor proxy for exploitation likelihood — some KEV entries score as Medium CVSS but are actively used in ransomware campaigns. Any vulnerability on the KEV list should trigger an emergency patch track, compressing your normal staged pipeline to 24-48 hours for critical assets.
How do you patch Linux servers at scale?
Ansible is the most common approach for Linux patching at scale: playbooks using the yum or apt modules with `state: latest` and the `security: yes` flag (RHEL) to restrict to security patches. For kernel patches requiring reboots, coordinate rolling reboots during maintenance windows — never reboot all clustered nodes simultaneously. For high-availability systems where reboot downtime is unacceptable, live patching tools (kpatch on RHEL, Canonical Livepatch on Ubuntu) can apply kernel patches without a reboot.
Sources & references
- CISA Known Exploited Vulnerabilities Catalog
- Ivanti State of Security Preparedness 2025
- SANS Vulnerability Management Survey 2024
- NIST SP 800-40r4: Guide to Enterprise Patch Management Planning
Free resources
Critical CVE Reference Card 2025–2026
25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.
Ransomware Incident Response Playbook
Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.
Get threat intel before your inbox does.
50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.
Unsubscribe anytime. We never sell your data.

Founder & Cybersecurity Evangelist, Decryption Digest
Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.
The Mythos Brief is free.
AI that finds 27-year-old zero-days. What it means for your security program.
