67%
Of organizations running Kubernetes had a security incident related to a container or Kubernetes misconfiguration in the past 12 months, per Red Hat 2025 State of Kubernetes Security
94%
Of Kubernetes security incidents are caused by misconfigurations rather than exploitation of software vulnerabilities
3x
More Kubernetes deployments in production than three years ago; the attack surface has grown proportionally
CIS Level 1
The minimum hardening baseline recommended by CISA, NSA, and most cloud security frameworks for any production Kubernetes cluster

Kubernetes has become the default runtime for containerized workloads across enterprise environments. Its security model is powerful but complex: dozens of configurable components, RBAC policies spanning multiple resource types, and a default configuration that prioritizes ease of deployment over security hardening.

The consequence is predictable. 94% of Kubernetes security incidents are caused by misconfiguration, not by zero-days. Overly permissive RBAC, missing network policies, privileged pods, exposed dashboards, and unrotated credentials are the attack surface. This is good news for defenders: misconfiguration is fixable with systematic controls, without waiting for vendor patches.

This checklist is organized by control domain and maps to the CIS Kubernetes Benchmark v1.9 and NSA/CISA Kubernetes Hardening Guidance. It is designed for platform engineers, DevSecOps practitioners, and security architects hardening production clusters.

Control Plane Hardening

The Kubernetes control plane (API server, etcd, controller manager, scheduler) is the highest-value target in any cluster. Compromise of the control plane provides full cluster access.

API server hardening is the most critical control. The API server should never be directly exposed to the internet. If managed Kubernetes (EKS, AKS, GKE) with a public endpoint is required, enable authorized networks to restrict access to specific IP ranges. Disable anonymous authentication: --anonymous-auth=false. Enable audit logging with a comprehensive audit policy that captures all request and response bodies for sensitive resource types. Disable the insecure port (should be default off in Kubernetes 1.20+, but verify). Use the NodeRestriction admission controller to limit what kubelet can modify.

Etcd requires specific protections because it stores all cluster state, including secrets in base64 encoding. Etcd should only be accessible from the API server; no other component or operator should have direct etcd access. Enable etcd encryption at rest using the EncryptionConfiguration API with AES-GCM or AES-CBC for secret resources. Require mutual TLS for all etcd client connections. Back up etcd regularly and test restoration procedures; an etcd backup is the recovery path for most cluster disasters.

For managed Kubernetes offerings, many control plane hardening controls are managed by the provider. Verify your provider's shared responsibility model and audit the controls you retain responsibility for.

API server: disable anonymous auth

Set --anonymous-auth=false. Anonymous requests bypass RBAC and should never be permitted in production clusters.

API server: enable audit logging

Deploy a comprehensive audit policy capturing metadata, request, and response levels for sensitive operations. Send audit logs to an external SIEM, not just cluster-local storage.

etcd: enable encryption at rest

Configure EncryptionConfiguration with AES-GCM for the secrets resource. Without this, anyone who obtains an etcd backup can read all cluster secrets in base64.

etcd: restrict network access

Firewall etcd to accept connections only from the API server. Use mTLS for all etcd client connections.

Control plane: enable admission controllers

Enable NodeRestriction, PodSecurity, and ResourceQuota admission controllers at minimum. Consider OPA/Gatekeeper or Kyverno for policy-as-code enforcement.

RBAC Hardening

Kubernetes RBAC is the primary access control mechanism for all cluster operations. It controls who can create, read, update, and delete every Kubernetes resource type. Misconfigured RBAC is the most common path to privilege escalation in Kubernetes environments.

The principle of least privilege applies to both human users and service accounts. No user or service account should have cluster-admin unless they genuinely require full cluster access. The cluster-admin ClusterRoleBinding should have fewer than five members in most production clusters. Audit it regularly.

Service accounts are the RBAC identity used by pods. By default, every pod is mounted with the default service account token for its namespace, and that token can be used to query the Kubernetes API. For pods that do not require API access, set automountServiceAccountToken: false in the pod spec. For pods that do require API access, create a dedicated service account with the minimum required permissions and bind it explicitly.

Wildcard permissions in roles are a common misconfiguration: verbs: ["*"] or resources: ["*"] grants far broader access than intended. Audit all ClusterRoles and Roles for wildcard permissions and replace them with explicit permission lists. Pay particular attention to permissions on secrets (any principal that can read secrets in a namespace can read all secrets including credentials stored there), pods (the ability to create pods is effectively equivalent to code execution in the namespace), and deployments (anyone who can create deployments can run arbitrary container images).

Use kubectl auth can-i --list --as <serviceaccount> to audit what permissions specific identities hold. Tools like rbac-audit, rbac-police, and KubeHound automate RBAC misconfiguration detection across large clusters.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Network Policy and Traffic Segmentation

By default, all pods in a Kubernetes cluster can communicate with all other pods across all namespaces. This flat network model means a compromised pod has network access to every other service in the cluster. Network Policies implement microsegmentation: explicit allow rules that define which pods can communicate with which other pods.

The baseline posture is a default-deny policy in every namespace, followed by explicit allow rules for required communication paths. A default-deny ingress NetworkPolicy that selects all pods in a namespace blocks all inbound traffic to those pods unless another NetworkPolicy explicitly allows it. The same pattern applies to egress.

Namespace isolation is the primary segmentation boundary. Workloads with different trust levels (public-facing services, internal APIs, databases) should be in separate namespaces with NetworkPolicies that enforce explicit communication paths. A NetworkPolicy permitting the web tier to reach the API tier, the API tier to reach the database, and no lateral communication within each tier significantly limits blast radius from a pod compromise.

For clusters where standard Kubernetes NetworkPolicy is insufficient (which does not support L7 filtering or cross-cluster policies), service mesh solutions (Istio, Linkerd, Cilium) provide richer policy capabilities including mTLS between pods, L7 authorization policies, and observability for all pod-to-pod traffic. Cilium's eBPF-based network policy enforcement also provides significantly better performance than traditional iptables-based NetworkPolicy implementations at scale.

Pod Security and Workload Hardening

Pod security controls restrict what containers can do at the Linux capability and system call level. The Kubernetes Pod Security Admission (PSA) controller, which replaced deprecated PodSecurityPolicy in Kubernetes 1.25, enforces three security profiles: Privileged (no restrictions), Baseline (prevents known privilege escalation), and Restricted (strongest security posture).

Production workloads should run under the Restricted policy where possible. Restricted requires: running as a non-root user; setting readOnlyRootFilesystem; dropping all capabilities and only adding back what is explicitly required; disabling privilege escalation (allowPrivilegeEscalation: false); and using a non-default seccomp profile.

For containers that cannot meet Restricted requirements immediately, Baseline prevents the most critical misconfigurations: running privileged containers, mounting host paths, using hostNetwork, hostPID, or hostIPC, and using dangerous capabilities like SYS_ADMIN.

Seccomp profiles restrict which system calls a container process can make. The runtime/default seccomp profile blocks uncommon and dangerous system calls without breaking most applications. For sensitive workloads, audit mode (SCMP_ACT_LOG rather than SCMP_ACT_ERRNO) can capture which syscalls your application actually uses before creating a custom allow-list profile.

Image security is a related control: only run images from trusted registries, validate image signatures before admission using Cosign and image policy webhooks, and scan all images for known vulnerabilities in the CI/CD pipeline before they reach production.

Secrets Management and Runtime Detection

Kubernetes Secrets are base64-encoded by default, not encrypted. Any user or service account with read access to Secrets in a namespace can decode them trivially. This creates a critical gap between the name 'Secret' and its actual confidentiality properties.

Etcd encryption at rest (covered in the control plane section) is the first required control. The second is strict RBAC on Secret reads: no service account should have wildcard secret access, and human access to production secrets should go through a break-glass process with audit logging rather than direct kubectl access.

External secrets management integrates Kubernetes with dedicated secrets stores: HashiCorp Vault via the Vault Agent Injector or External Secrets Operator, AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager. The External Secrets Operator (ESO) synchronizes secrets from external stores into Kubernetes Secrets on a configurable schedule, enabling rotation at the secrets store level without application changes. This approach provides a centralized audit trail, rotation automation, and access controls that are more granular than what Kubernetes RBAC alone can enforce.

Runtime threat detection provides the behavioral monitoring layer that static configuration controls cannot. Falco is the dominant open-source runtime security tool for Kubernetes: it uses eBPF or kernel module hooks to monitor system calls and generate alerts when container behavior deviates from expected patterns. Default Falco rules detect: shell execution inside containers, sensitive file reads, privilege escalation attempts, outbound connections to unexpected hosts, and cryptomining binary execution. Custom rules can be added for application-specific threat models. Feed Falco alerts to your SIEM for centralized correlation with other cluster and infrastructure events.

The bottom line

Kubernetes security hardening is not a one-time task. New workloads introduce new RBAC bindings and pod specs that may violate security policies. New Kubernetes versions deprecate and introduce security controls. The cluster configuration that was hardened six months ago may have drifted. Implement continuous compliance scanning (kube-bench for CIS Benchmark controls, Trivy or Starboard for vulnerability scanning) as a recurring CI/CD gate and scheduled audit, not just a pre-launch checklist. The organizations that maintain strong Kubernetes security posture treat it as an ongoing operational discipline, not a project milestone.

Frequently asked questions

What is the CIS Kubernetes Benchmark and should we follow it?

The CIS Kubernetes Benchmark is a consensus-based configuration guide for securing Kubernetes clusters, published by the Center for Internet Security. It categorizes controls as Level 1 (basic security, minimal operational impact) and Level 2 (defense-in-depth, may have operational tradeoffs). Most security frameworks, including NIST CSF, SOC 2, and FedRAMP, reference CIS Benchmarks. You should follow it as the baseline, using kube-bench to automate compliance checking. For managed Kubernetes (EKS, AKS, GKE), the provider handles some control plane controls, and the benchmark has managed cluster-specific profiles.

What replaced PodSecurityPolicy and what should we use instead?

PodSecurityPolicy (PSP) was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), which enforces three built-in profiles: Privileged, Baseline, and Restricted. PSA is simpler to operate than PSP but less granular. For organizations that need PSP-level control (custom policies, exception handling, namespace-specific overrides), use OPA/Gatekeeper or Kyverno as policy engines alongside PSA.

What is the risk of running privileged containers?

A privileged container has full access to the host's Linux capabilities, including the ability to modify host file system mounts, access other containers' namespaces, load kernel modules, and interact directly with host network interfaces. A privileged container compromise is effectively equivalent to host compromise. Never run privileged containers in production unless the workload has a documented, unavoidable requirement (such as a node-level security agent). Even in those cases, use specific capability additions rather than the blanket privileged flag.

How do Kubernetes Network Policies actually work?

Kubernetes Network Policies are enforced by the CNI (Container Network Interface) plugin, not by Kubernetes itself. Standard policies define ingress and egress rules based on pod selectors, namespace selectors, and IP blocks. If your CNI plugin does not support NetworkPolicy (Flannel does not by default), policies you apply are silently ignored. Verify your CNI supports NetworkPolicy before relying on it. Calico, Cilium, Weave, and Canal all support standard NetworkPolicy enforcement.

What is kube-bench and how do we use it?

kube-bench is an open-source tool from Aqua Security that checks Kubernetes cluster configuration against the CIS Kubernetes Benchmark. It runs checks against the API server, etcd, kubelet, controller manager, and scheduler configurations and produces a report of passing, failing, and warning checks with remediation guidance. Run kube-bench as a scheduled job in your cluster and as part of your CI/CD pipeline for infrastructure-as-code changes to catch configuration drift before it reaches production.

Should we use a service mesh for Kubernetes security?

A service mesh (Istio, Linkerd, Cilium) adds significant security capabilities: mutual TLS between all pods (even without application-level TLS), L7-aware authorization policies, and full traffic observability. The tradeoff is operational complexity: service meshes add latency, resource overhead, and a new failure domain to manage. For organizations with strong security requirements (financial services, healthcare, regulated industries) or complex multi-tenant clusters, the security benefits justify the complexity. For simpler environments, standard NetworkPolicy with mTLS at the application layer is often sufficient.

How do we detect cryptomining in Kubernetes clusters?

Cryptomining is the most common payload in opportunistic Kubernetes compromises via exposed API servers and misconfigured CI/CD pipelines. Detection signals include: high CPU utilization on specific pods without corresponding application load; outbound connections to mining pool domains or IPs; execution of known mining binaries (xmrig, minerd); and unusual process names spawned inside containers. Falco rules for cryptomining are included in the default ruleset. Add network-level detection at the CNI layer for connections to known mining infrastructure. Monitor for unusual resource consumption in Kubernetes metrics (Prometheus) as an early-warning signal.

Sources & references

  1. CIS Kubernetes Benchmark v1.9
  2. NSA/CISA: Kubernetes Hardening Guidance
  3. NIST SP 800-190: Application Container Security Guide
  4. Falco: Runtime Security for Kubernetes
  5. Open Policy Agent: Kubernetes Admission Control

Free resources

25
Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

No spam. Unsubscribe anytime.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

No spam. Unsubscribe anytime.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Eric Bang
Author

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.

Joins Decryption Digest. Unsubscribe anytime.

Daily Briefing

Get briefings like this every morning

Actionable threat intelligence for working practitioners. Free. No spam. Trusted by 50,000+ SOC analysts, CISOs, and security engineers.

Unsubscribe anytime.

Mythos Brief

Anthropic's AI finds zero-days your scanners miss.