Certificate Lifecycle Management for Enterprise: Eliminating Cert Sprawl and Outages
TLS certificate expiration has caused outages at Microsoft, Azure, LinkedIn, Spotify, and dozens of enterprise IT environments that did not make the news. The pattern is consistent: a certificate was issued, added to a spreadsheet or forgotten entirely, and when it expired, the service it protected stopped working. In 2025, this problem is getting harder — certificate validity periods are shortening (Apple has proposed 47-day maximum), the number of certificates per enterprise is growing (DevOps, microservices, and cloud-native architectures have multiplied certificate counts), and post-quantum migration will require replacing virtually every certificate in the enterprise. Certificate lifecycle management (CLM) has moved from a nice-to-have to a core infrastructure security capability.
The Certificate Sprawl Problem: How Enterprises Get Here
Understanding how certificate sprawl develops helps design governance that prevents it.
Manual certificate issuance without central tracking: The traditional PKI workflow: a team needs a certificate, they submit a CSR to an internal CA or external CA, receive the certificate, install it, and... nothing. No system was updated with the expiration date. No monitoring is configured. In 12 months (or 13 months, or 2 years), the certificate expires and the first notification is a user error.
DevOps and microservices multiplication: A monolithic application needed one certificate. The same application decomposed into 50 microservices may need 50 certificates — for TLS between services, for service identity in a service mesh, for mutual TLS (mTLS) authentication between components. Teams self-provisioning certificates without central visibility creates rapid sprawl.
Cloud and CDN certificates outside PKI visibility: AWS Certificate Manager, Azure App Service certificates, Cloudflare-managed certificates, and GCP-managed SSL certificates may be managed by individual application teams with no visibility from the central security or infrastructure team. These represent a growing fraction of total enterprise certificates.
Shadow certificates: Certificates issued directly to developers, operations staff, or external vendors without going through any central process. These often have long validity periods, weak key parameters, or are issued by CAs not on the organization's approved CA list.
Validity period reduction forcing faster rotation: Historically, TLS certificates were valid for 2-3 years. Today, Let's Encrypt certificates are 90 days. Apple's ballot in the CA/Browser Forum to reduce maximum validity to 47 days by 2027 would require all enterprise TLS certificates to be renewed every 45 days. Manual renewal at that cadence is impossible — automation is required.
Certificate Discovery: Finding What You Don't Know About
CLM starts with discovery — you cannot manage certificates you do not know exist. Discovery has two components: network scanning and CA/vault integration.
Network scanning for TLS certificates: Scan all IP ranges and hostnames in your environment on ports 443, 8443, and any other HTTPS ports used. For each TLS connection, extract the certificate and record: common name, SANs, issuer CA, validity dates, key type and length, and whether the certificate chain is trusted.
Tools for certificate discovery:
- Censys.io / Shodan: Scan internet-facing assets for external certificate inventory
- Nmap with ssl-cert script:
nmap -p 443 --script ssl-cert 10.0.0.0/8for internal network scanning - SSLyze / TestSSL.sh: TLS configuration analysis in addition to certificate extraction
- Venafi / Keyfactor / AppViewX: CLM platforms that include network discovery agents
CA and vault integration: Pull certificate inventory directly from:
- Internal Microsoft CA (Active Directory Certificate Services):
certutil -view -out csvor PowerShell CertificationAuthority module - HashiCorp Vault PKI secrets engine: API query for all issued certificates
- AWS Certificate Manager:
aws acm list-certificates - Azure Key Vault:
az keyvault certificate list - Let's Encrypt: ACME API for certificates issued by your account
Reconcile discovery sources: Network scan results and CA records rarely match perfectly. Certificates found in network scanning but not in CA records are shadow certificates (issued by an unapproved CA or issued by an approved CA outside the tracked workflow). Certificates in CA records but not found in network scanning may be installed in non-standard locations, on systems not in your IP inventory, or already revoked but not removed from CA records.
Briefings like this, every morning before 9am.
Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.
ACME Protocol and Automated Certificate Renewal
Manual certificate renewal does not scale below 90-day validity periods. The ACME (Automatic Certificate Management Environment) protocol, standardized in RFC 8555, provides the automation foundation for certificate lifecycle management.
How ACME works:
- ACME client on the server generates a key pair and CSR
- Client requests a certificate from the ACME CA
- CA issues a challenge to prove domain control (HTTP-01, DNS-01, or TLS-ALPN-01)
- Client completes the challenge
- CA issues the certificate
- Client installs the certificate and schedules the next renewal
This entire process runs automatically on a schedule — no human involvement required after initial setup.
ACME clients:
- Certbot: The reference ACME client from EFF. Widely deployed for Let's Encrypt. Supports Apache and Nginx plugins for automatic installation and reload.
- acme.sh: Lightweight shell script ACME client. Good for environments where Python (required by Certbot) is not available.
- Caddy: Web server with built-in ACME support. Automatic HTTPS with zero configuration.
- cert-manager (Kubernetes): Native Kubernetes certificate lifecycle management. Integrates with Let's Encrypt, Vault, Venafi, and internal CAs. Essential for Kubernetes-native certificate management.
ACME for internal CAs: ACME is not just for public CAs. Microsoft ADCS supports ACME via a third-party module. HashiCorp Vault PKI includes a native ACME endpoint. Smallstep step-ca provides a full ACME-compliant internal CA. Running ACME against your internal CA gives you the same automation for internal certificates that Let's Encrypt provides for public TLS.
Challenge types:
- HTTP-01: CA verifies domain ownership by requesting a file at
http://[domain]/.well-known/acme-challenge/[token]. Requires port 80 to be accessible from the CA. - DNS-01: CA verifies domain ownership by requesting a DNS TXT record at
_acme-challenge.[domain]. Required for wildcard certificates. Requires API access to your DNS provider for automation. - TLS-ALPN-01: Validation via a temporary TLS certificate on port 443. No port 80 or DNS API required.
Enterprise CLM Platforms: Venafi, Keyfactor, and AppViewX
For large enterprises with thousands of certificates across multiple CAs and platforms, a dedicated CLM platform is required. Spreadsheets and manual tracking do not scale.
Venafi Trust Protection Platform: The market leader for enterprise CLM. Discovers certificates across the full environment (network scanning, CA integration, cloud integration), provides a centralized inventory, enforces policy (approved CAs, minimum key lengths, maximum validity periods), automates renewal via ACME and proprietary connectors, and integrates with DevOps toolchains (Kubernetes, HashiCorp Vault, Terraform). Strong for large, heterogeneous environments with multiple CAs and complex certificate policies.
Keyfactor Command: Full-featured CLM with built-in CA capabilities (Keyfactor includes its own CA in addition to managing external CAs). Strong for organizations that want to modernize their internal PKI alongside CLM implementation. EJBCA PKI included in the platform provides enterprise-grade CA functionality without Microsoft ADCS dependency.
AppViewX CERT+: CLM focused on network and security infrastructure — load balancers, ADCs, firewalls. Strong F5, NetScaler, and A10 integrations. Good for organizations whose primary certificate management pain is network infrastructure appliances where other tools have weaker coverage.
AWS Certificate Manager (ACM): Free for certificates used with AWS services (ELB, CloudFront, API Gateway). Automatic renewal for ACM-managed certificates. Does not extend to on-premises or non-AWS infrastructure. Effective for AWS-native workloads; not a complete enterprise CLM solution.
HashiCorp Vault PKI Secrets Engine: For organizations using Vault as their secrets management platform, the PKI secrets engine provides certificate issuance with short-lived certificates (days to hours), automatic renewal, and ACME support. Excellent for service-to-service TLS in microservices environments. Not a full CLM platform — no network discovery, limited visibility into non-Vault-issued certificates.
Evaluation criteria:
- Discovery coverage (network scanning + CA integration + cloud platform integration)
- Automation connectors for your specific infrastructure (F5, nginx, Kubernetes, AWS, Azure)
- ACME support for internal and external CAs
- Policy enforcement (approved CAs, key requirements, validity limits)
- SIEM/alerting integration for expiration alerts and policy violations
- Workflow for certificate requests from developers (self-service with guardrails)
PKI Design for Enterprise Environments
Good CLM governance requires sound PKI architecture underneath it. Common PKI design mistakes create downstream CLM complexity.
Two-tier PKI hierarchy (recommended for most enterprises):
Offline Root CA
└─ Issuing CA (online, issues end-entity certs)
└─ End-entity certificates (TLS, code signing, user auth)
The Root CA is kept offline (powered off and stored in a secure location). Compromise of the Root CA compromises your entire PKI — offline storage prevents this. The Issuing CA is online and issues certificates for day-to-day use. If the Issuing CA is compromised, you revoke it and issue a new one from the offline Root.
Three-tier PKI hierarchy (large enterprises with multiple divisions):
Offline Root CA
└─ Policy CA (offline, sets policy)
├─ Issuing CA 1 (division / geography)
└─ Issuing CA 2 (division / geography)
Adds a Policy CA between Root and Issuing CAs for organizational separation. Appropriate when different business units need different certificate policies (e.g., healthcare division with HIPAA requirements vs. general enterprise).
Common PKI design mistakes:
- Single online CA: The CA is both Root and Issuing. Compromise of the online CA requires replacing the entire PKI trust anchor.
- Excessively long Root CA validity: 20-year root CA validity means the key material must be protected for 20 years. 10 years is more manageable.
- No HSM for CA key storage: Root and Issuing CA private keys stored on software-only keystores (file system or Windows certificate store) are vulnerable to extraction. HSMs prevent key extraction by enforcing key operations within the hardware boundary.
- No CRL/OCSP: Without a functioning Certificate Revocation List or OCSP responder, revoked certificates continue to be trusted by clients. Verify CRL/OCSP infrastructure is operational before deploying a PKI.
- Missing CA certificate in trust stores: If the Issuing CA certificate is not in the trust store of all clients and services, certificates issued by it produce trust errors. Distribute CA certificates via GPO (Windows), MDM (macOS/iOS), or configuration management (Linux).
Monitoring, Alerting, and Expiration Prevention
Even with automation, monitoring is required — ACME renewal failures happen, and manual certificates exist where automation has not yet reached.
Expiration monitoring tiers:
90 days before expiration: Flag in CLM platform. Assign to owner for renewal planning if manual. Verify automation is configured and functioning if automated.
30 days before expiration: Alert to certificate owner and their manager. For automated certificates, trigger renewal test if not already renewed. For manual certificates, begin renewal process immediately.
14 days before expiration: Escalate to infrastructure team lead. For any certificate not yet renewed at this point, treat as an incident rather than a normal workflow item.
7 days before expiration: Executive-level visibility. Any certificate expiring in 7 days that is not yet renewed should be in a war room conversation — what does it protect? What is the impact? What is blocking renewal?
Monitoring implementation options:
- CLM platform alerts: The primary mechanism for tracked certificates
- Prometheus + ssl-exporter: Scrape TLS endpoints and export expiration metrics. Alert via Alertmanager when expiration is within threshold.
- Nagios / Zabbix plugins: check_ssl_cert provides expiration monitoring for specific hosts/ports
- Datadog / New Relic: Both have TLS certificate expiration monitoring built into their agent
Post-quantum certificate migration: Current RSA and ECDSA certificates will be cryptographically vulnerable to quantum computers capable of running Shor's algorithm at scale. NIST finalized post-quantum cryptographic standards in 2024 (ML-KEM, ML-DSA, SLH-DSA). CLM platforms will be the primary mechanism for executing the migration — replacing every certificate in the enterprise with post-quantum algorithm certificates. Organizations should ensure their CLM inventory is complete before this migration begins, as you cannot migrate certificates you do not know about.
The bottom line
Certificate lifecycle management is no longer an optional capability for large enterprises — shortening validity periods, expanding certificate counts from cloud and microservices architectures, and the upcoming post-quantum migration make manual processes and spreadsheet tracking permanently inadequate. Start with discovery to establish a complete inventory, deploy ACME automation for public-facing and internal certificates where possible, implement a CLM platform when certificate counts exceed what manual tracking can handle, and build expiration monitoring with escalating alert tiers. The enterprises that have suffered public certificate expiration outages all had the same root cause: they did not know a certificate existed until it expired.
Frequently asked questions
What is certificate lifecycle management (CLM)?
Certificate lifecycle management is the set of processes and tools that track, monitor, and automate the renewal of TLS/SSL and other PKI certificates throughout their validity period. It encompasses discovery (finding all certificates in use), inventory (centralized tracking with expiration dates and ownership), automation (renewing certificates before expiration without manual intervention), and governance (enforcing certificate policies like approved CAs, minimum key lengths, and maximum validity periods).
What is the ACME protocol and how does it automate certificate renewal?
ACME (Automatic Certificate Management Environment, RFC 8555) is a protocol that allows servers to automatically obtain and renew TLS certificates by proving domain control to a CA without human intervention. The ACME client generates a key pair, completes a domain validation challenge (HTTP-01, DNS-01, or TLS-ALPN-01), receives the certificate, installs it, and schedules the next renewal. Let's Encrypt uses ACME, as do internal CAs like HashiCorp Vault PKI and step-ca.
Why are certificate expiration outages still happening at large enterprises?
The root cause is always the same: a certificate was not tracked or monitored. Contributing factors include manual certificate issuance without central inventory, development teams self-provisioning certificates outside visibility of the CLM program, certificates in cloud services managed independently by application teams, and spreadsheet-based tracking that is not actively maintained. With certificate validity periods shortening, manual tracking is no longer viable — automation and centralized platforms are required.
What is the difference between Venafi, Keyfactor, and AWS Certificate Manager?
Venafi and Keyfactor are enterprise CLM platforms that discover, inventory, and automate certificates across heterogeneous environments — on-premises, cloud, network appliances, and Kubernetes — from any CA. AWS Certificate Manager is a free AWS service that manages certificates for AWS services only, with automatic renewal. ACM is not a complete enterprise CLM solution but is essential for AWS-native workloads. Venafi and Keyfactor cover the full enterprise including ACM-managed certificates alongside everything else.
How short will TLS certificate validity become?
Let's Encrypt certificates are 90 days. Apple's ballot in the CA/Browser Forum (Ballot SC-081) proposes reducing maximum public TLS certificate validity to 47 days by 2027. If adopted, all publicly trusted TLS certificates would need to be renewed every 45 days. At that cadence, manual renewal is impossible — ACME automation or a CLM platform is required for every public-facing TLS certificate.
What is a two-tier PKI hierarchy and why is it recommended?
A two-tier PKI hierarchy separates the Root CA (kept offline and powered off except during maintenance) from the Issuing CA (online, issues day-to-day certificates). The Root CA's private key is the trust anchor for the entire PKI — keeping it offline prevents compromise. If the Issuing CA is compromised, it can be revoked and replaced using the offline Root CA without rebuilding the entire PKI trust chain. A single online CA that serves as both Root and Issuing combines the risk of an online system with the consequences of Root CA compromise.
Sources & references
- NIST SP 800-57: Recommendation for Key Management
- Let's Encrypt ACME Protocol (RFC 8555)
- Keyfactor PKI and Machine Identity Management Survey 2025
- Gartner Market Guide for Machine Identity Management 2025
Free resources
Critical CVE Reference Card 2025–2026
25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.
Ransomware Incident Response Playbook
Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.
Get threat intel before your inbox does.
50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.
Unsubscribe anytime. We never sell your data.

Founder & Cybersecurity Evangelist, Decryption Digest
Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.
The Mythos Brief is free.
AI that finds 27-year-old zero-days. What it means for your security program.
