Active ThreatSeptember 17, 202611 min read

Kubernetes Security Best Practices for Production Environments

Sources:CIS Kubernetes Benchmark v1.9|NSA/CISA Kubernetes Hardening Guide|OWASP Kubernetes Top 10

Eric Bang

Founder & Cybersecurity Evangelist

59%

Of Kubernetes environments have at least one exposed API server in security scans (2025)

78%

Of Kubernetes security incidents involve RBAC misconfigurations or credential exposure

94%

Of container images in production environments contain at least one known CVE

Network policies defined by default in a new Kubernetes cluster — all pod-to-pod traffic is allowed

A default Kubernetes cluster is not secure. By design, it favors ease of use over security posture: all pods can communicate with all other pods, service accounts have broad default permissions, the API server may be accessible from the internet, and secret values are base64-encoded (not encrypted) in etcd.

Hardening a Kubernetes cluster for production requires explicit configuration across six domains: API server access, RBAC, network policies, pod security, secrets management, and runtime threat detection. This guide covers the specific configurations that matter most in each domain.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts — distilled for practitioners. 50,000+ subscribers. No noise.

API Server and Control Plane Hardening

The Kubernetes API server is the most sensitive component in the cluster — every control plane and workload operation passes through it. Unauthorized API server access is the most direct path to complete cluster compromise.

Enable API server audit logging. Kubernetes audit logs record every API request with the requesting user, resource, verb, and response code. Without audit logging, post-incident investigation of cluster compromise is severely limited. Configure audit policy to capture at minimum: all requests at the Metadata level (request metadata without request/response bodies), Request and Response level for sensitive resource types (secrets, configmaps, roles, rolebindings), and RequestResponse level for all modifications.

Restrict API server network access. The API server should not be accessible from the public internet unless you have a specific operational requirement for it. For managed Kubernetes services (EKS, AKS, GKE), use private cluster configurations that restrict API endpoint access to specific CIDR ranges or VPC endpoints. For self-managed clusters, place the API server behind a load balancer accessible only from management networks and VPN ranges.

Disable anonymous API server access. Ensure --anonymous-auth is set to false. In default configurations, unauthenticated requests are assigned to the system:anonymous user and system:unauthenticated group — in some cluster configurations, these have unintended permissions that allow enumeration or exploitation without credentials.

Enable etcd encryption at rest for secrets. By default, Kubernetes secrets are stored in etcd as base64-encoded strings, not encrypted. Enable EncryptionConfiguration to encrypt secret values at rest using AES-GCM or envelope encryption with a KMS provider.

RBAC: Least Privilege for Service Accounts and Users

RBAC (Role-Based Access Control) is the primary access control mechanism in Kubernetes. Most RBAC problems fall into three patterns: overly permissive service accounts, wildcard resource permissions, and excessive use of cluster-admin.

Audit existing RBAC bindings regularly using kubectl auth can-i --list or tools like rbac-audit and rbac-lookup. Focus specifically on: any subject bound to cluster-admin (a very small set of administrators should have this), any role or clusterrole with wildcard verbs or resources (* permissions), any default service account with non-default permissions, and any external user or service account bound to roles that include escalation permissions (creating roles, creating rolebindings, impersonating users).

Service accounts used by application workloads should have the minimum permissions required for the workload to function — typically read access to specific configmaps or secrets, and sometimes get/list permissions for specific resource types. Create dedicated service accounts per workload rather than reusing the default service account. The default service account in each namespace is automatically mounted in pods unless explicitly disabled — set automountServiceAccountToken: false in service account definitions for workloads that do not need API access.

For human access, use short-lived credentials rather than static kubeconfig files with long-lived certificates. Managed Kubernetes services provide IAM-based authentication (aws eks get-token for EKS, az aks get-credentials with AAD integration for AKS) that ties cluster access to your identity provider's session management.

Network Policy and Pod Security Standards

Kubernetes has no network policy enforced by default — all pods can communicate with all other pods across all namespaces. Network policies require a CNI plugin that supports policy enforcement (Calico, Cilium, Weave Net) and must be explicitly configured.

Start with a default-deny ingress policy in every namespace that applies to all pods. Then add allow rules for the specific traffic flows required. A common namespace segmentation pattern: default-deny-ingress policy in all namespaces, explicit allow policies for service-to-service communication within the namespace, explicit allow policies for cross-namespace communication required by the application architecture, and allow policy for ingress from the ingress controller namespace.

For east-west traffic between services, Cilium's network policy based on service identity (pod labels) rather than IP addresses is significantly more maintainable than IP-based policies — pod IPs change on restart, but label selectors are stable.

Pod Security Standards (PSA), introduced as the replacement for PodSecurityPolicy (deprecated in 1.21, removed in 1.25), define three policy levels: Privileged (no restrictions), Baseline (prevents known privilege escalation), and Restricted (follows current pod hardening best practices). Apply Baseline or Restricted to all production namespaces. The Restricted policy enforces: non-root user, read-only root filesystem where possible, no privileged containers, no host namespace sharing, no host path mounts, seccomp profile set to RuntimeDefault or a more restrictive profile.

For policy enforcement beyond PSA, OPA Gatekeeper and Kyverno provide admission control that can enforce custom policies: require specific labels on all pods, enforce image registry restrictions (only allow images from your internal registry), block containers with critical CVEs (integrating with image scanning), and enforce resource request/limit requirements.

Secrets Management and Runtime Threat Detection

Kubernetes native secrets provide base64 encoding, not encryption. Even with etcd encryption at rest, secrets mounted as environment variables or volumes in pods are readable by anyone who can exec into the container. For sensitive secrets (database credentials, API keys, certificates), integrate an external secrets manager.

External Secrets Operator (ESO) synchronizes secrets from AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, and GCP Secret Manager into Kubernetes secrets, allowing workloads to consume secrets in the standard Kubernetes pattern while the actual secret value is never persisted in etcd. Vault Agent Sidecar Injector is an alternative that delivers secrets directly to the pod filesystem without storing them in Kubernetes secrets at all.

For runtime threat detection, Falco (CNCF project) is the standard open-source solution. Falco uses kernel system call tracing to detect container escape attempts, privilege escalation, unexpected file system modifications, unexpected network connections, and known malicious tool execution (cryptominers, reverse shells). Deploy Falco as a DaemonSet and route alerts to your SIEM or alerting platform. Commercial alternatives include Sysdig Secure and Aqua Security, which add vulnerability management, compliance, and SIEM integration to Falco-based detection.

Container image supply chain security has become a critical control area post-SolarWinds and post-XZ Utils backdoor. Implement image signing (Cosign from the Sigstore project) and signature verification at admission (using Kyverno or OPA Gatekeeper policies) to ensure only images signed by your CI/CD pipeline can run in production. Combine with SCA scanning in CI/CD to block images with critical CVEs before they reach the registry.

Restrict API server access to management networks only — never expose to the internet

Public API server access is the most direct path to cluster compromise. Private cluster endpoints eliminate this attack surface.

Apply default-deny network policies in every namespace before deploying workloads

Default allow is not a network security model. Explicit allowlists for required traffic flows are the correct pattern.

Disable default service account token automounting for workloads that do not need API access

Automounted tokens give every compromised pod potential RBAC permissions. Disable for all workloads that do not require API server access.

Deploy Falco for runtime syscall-level threat detection

RBAC and network policies prevent known-bad patterns. Falco detects behavioral anomalies at runtime that policy cannot anticipate.

Use External Secrets Operator to sync secrets from a secrets manager rather than storing in etcd

Kubernetes native secrets are base64-encoded in etcd. External secrets managers provide encryption, rotation, and audit logging.

Subscribe to unlock Remediation & Mitigation steps

Free subscribers unlock full IOC lists, remediation steps, and every daily briefing.

The bottom line

Kubernetes security is a layered problem: each layer (control plane, RBAC, network, pod security, secrets, runtime) can be hardened independently but all layers need attention. The NSA/CISA Kubernetes Hardening Guide and the CIS Kubernetes Benchmark both provide prescriptive configuration checklists that cover the most impactful controls — run your cluster against the CIS benchmark using kube-bench (open source) to identify gaps before going to production.

Frequently asked questions

What is the CIS Kubernetes Benchmark?

The CIS Kubernetes Benchmark is a prescriptive configuration guide published by the Center for Internet Security covering API server configuration, controller manager settings, scheduler settings, etcd configuration, kubelet configuration, and general policies. kube-bench (github.com/aquasecurity/kube-bench) is a free open-source tool that runs CIS benchmark checks against your cluster and reports findings by severity. Managed Kubernetes services (EKS, AKS, GKE) control plane components are not accessible for direct CIS benchmark scanning — use the cloud provider's security benchmarks for control plane components.

What replaced PodSecurityPolicy in Kubernetes?

PodSecurityPolicy (PSP) was deprecated in Kubernetes 1.21 and removed in 1.25. It was replaced by Pod Security Admission (PSA), a built-in admission controller that enforces the Privileged, Baseline, or Restricted pod security standards at the namespace level. PSA is simpler than PSP but less flexible. Organizations that need more granular policy enforcement (custom rules, image registry restrictions, label requirements) use OPA Gatekeeper or Kyverno as admission controllers alongside PSA.

How do I scan Kubernetes workloads for vulnerabilities?

Container vulnerability scanning operates at two layers: image scanning (scanning the container image layers for known CVEs before or at deployment) and runtime scanning (detecting vulnerable packages in running containers). For image scanning, integrate Trivy, Snyk Container, or AWS Inspector ECR scanning into your CI/CD pipeline as a blocking gate for critical CVEs. For managed Kubernetes clusters, Amazon Inspector (EKS), Microsoft Defender for Containers (AKS), and GKE Security Posture scanning provide cloud-native runtime vulnerability visibility.

What is the difference between Falco and OPA Gatekeeper?

Falco is a runtime security tool that detects threats in running containers by monitoring system calls. It answers the question: is something malicious happening right now in this pod? OPA Gatekeeper is an admission controller that enforces policies at deploy time. It answers the question: should this pod be allowed to run based on its configuration? Both are necessary: Gatekeeper prevents known-bad configurations from being deployed; Falco detects runtime behavior that Gatekeeper cannot anticipate. Deploy both in production Kubernetes environments.

Sources & references

Free resources

Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Author

Eric BangCISSP

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

View profile →LinkedIn

Back to all briefings

Subscribe for Updates

Kubernetes security k8s security RBAC network policy pod security container security Kubernetes hardening Falco OPA Gatekeeper supply chain security EKS security