Practitioner GuideMay 14, 202614 min read

Cloud Forensics and Incident Response: A Practitioner Guide

Sources:AWS — Security Incident Response Guide|Microsoft — Azure Security Incident Response|Google Cloud — Incident Response|CISA — Cloud Security Technical Reference Architecture

Eric Bang

Founder & Cybersecurity Evangelist

82%

of breaches involved cloud assets (Verizon DBIR 2024)

40%

of cloud incidents involve compromised identity/credentials (CrowdStrike 2024)

93 days

average cloud attacker dwell time before detection (IBM 2023)

70%

of cloud environments lack sufficient logging for effective incident investigation

Cloud incident response is not on-premises IR with a different console. Cloud environments introduce ephemeral compute, API-driven infrastructure, shared-responsibility log ownership, and identity as the primary attack surface. A compromised EC2 instance that is auto-scaled out and terminated takes its volatile forensic evidence with it. Attackers who move through cloud environments using stolen API credentials may leave no traditional indicators — just API call patterns in management plane logs. Effective cloud IR requires cloud-specific evidence collection procedures, knowledge of which log sources matter for each provider, and investigation workflows designed for environments where infrastructure can disappear.

Cloud vs. On-Premises IR: Key Differences

Several assumptions from on-premises incident response do not hold in cloud environments.

Ephemeral compute

Cloud instances can be terminated at any time by auto-scaling, infrastructure-as-code redeployment, or the attacker covering tracks. Forensic evidence must be captured immediately when an incident is detected — before the instance disappears. In on-premises IR, the affected system is typically preserved for forensic imaging.

Identity is the perimeter

Cloud attacks primarily use stolen or misconfigured IAM credentials rather than network-based techniques. An attacker with valid cloud credentials operates as a legitimate principal from the cloud provider's perspective. Investigation focuses on IAM activity — which credentials were used, what actions were taken, what lateral movement occurred through role assumptions.

Shared responsibility for logs

Cloud providers collect some logs by default; others require explicit enablement. AWS CloudTrail management events are on by default; S3 data events and Lambda invocation events require opt-in. Azure Activity Log is on by default; Entra ID sign-in logs require specific diagnostic settings. Gaps in log configuration mean gaps in investigation capability — and they are frequently discovered mid-incident.

API-first attack surface

Cloud management plane attacks use provider APIs (AWS API, Azure Resource Manager, GCP Cloud APIs). These calls are logged differently from traditional OS-level activity. SIEM rules designed for Windows Event Logs do not detect AWS API call patterns that indicate credential misuse or privilege escalation.

Multi-cloud lateral movement

Sophisticated attackers pivot across cloud providers using federation. A compromised AWS role with OIDC federation to Azure, or an Entra ID service principal with GCP workload identity federation, enables movement between cloud environments. IR scope must consider cross-cloud paths.

Critical Log Sources by Cloud Provider

Knowing which logs to collect before they expire is the most operationally important cloud IR skill.

AWS — CloudTrail

CloudTrail logs all AWS API calls: management events (default-on) cover control plane actions (IAM changes, instance launches, security group modifications). Data events (opt-in) cover S3 object operations and Lambda invocations. CloudTrail logs are the primary evidence source for every AWS investigation. Retention: 90 days in CloudTrail console; longer if shipped to S3. Enable multi-region trails and CloudTrail log file validation (integrity checking).

AWS — VPC Flow Logs

Network flow logs for VPC traffic: source/destination IP, port, protocol, bytes transferred. Not payload inspection — metadata only. Essential for establishing network communication patterns during an incident, identifying C2 communication, and lateral movement between instances.

AWS — GuardDuty findings

GuardDuty is AWS's managed threat detection service. Findings represent anomaly-detected activity: unusual API call patterns, cryptomining indicators, DNS queries to known malicious domains, compromised IAM credential use from unexpected geographies. GuardDuty findings are starting points for investigation, not evidence by themselves.

Azure — Activity Log

Azure Activity Log records all Azure Resource Manager (ARM) operations: resource creation/deletion, role assignment changes, policy modifications. The equivalent of AWS CloudTrail for the Azure control plane. Default retention is 90 days; ship to Log Analytics workspace for longer retention.

Azure — Entra ID Sign-in Logs

Sign-in logs record all authentication events to Entra ID: interactive, non-interactive (service principals), and managed identity. Essential for investigating compromised identity, credential stuffing, token theft, and MFA bypass. Default retention: 30 days (Entra ID Free), 90 days (P1/P2). Must be configured to ship to Log Analytics before retention expires.

Azure — Unified Audit Log (M365)

Captures activity across Exchange Online, SharePoint, OneDrive, Teams, and Entra ID. Critical for BEC investigations, data exfiltration from M365, and OAuth app abuse. Available in Microsoft Purview compliance portal. Default retention: 90 days (E3), 1 year (E5), 10 years with add-on.

GCP — Cloud Audit Logs

Four audit log types: Admin Activity (default-on), Data Access (opt-in, high volume), System Event (default-on), and Policy Denied (default-on). Admin Activity logs all control plane API calls. Data Access logs are required for investigating data exfiltration but generate significant volume requiring storage planning.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Cloud-Specific Attack Patterns to Investigate

Cloud attacks have characteristic patterns that differ from on-premises TTPs. Recognition accelerates investigation.

Credential compromise via IMDSv1

EC2 Instance Metadata Service v1 (IMDSv1) allows any process on the instance to retrieve IAM credentials without authentication. Server-Side Request Forgery (SSRF) vulnerabilities in web applications have been used to steal IAM credentials from the metadata service. Look for unusual GetCallerIdentity calls followed by API activity from unexpected IP addresses, or investigate SSRF vulnerabilities in applications running on EC2. Enforce IMDSv2 on all instances (metadata token required).

Privilege escalation via IAM misconfigurations

Common AWS privilege escalation paths: iam:PassRole + ec2:RunInstances (create instance with a more privileged role), iam:CreatePolicyVersion (create a new policy version with admin permissions), sts:AssumeRole to a less-restricted role. Investigate unexpected role assumption chains and IAM policy modifications in CloudTrail.

S3 data exfiltration

Attackers with S3 read permissions can exfiltrate data by copying objects to attacker-controlled S3 buckets or enabling S3 bucket replication to external accounts. Indicators: GetObject calls at high volume, PutBucketReplication API calls, S3 bucket ACL modifications to enable public access. Requires S3 data events enabled in CloudTrail.

Persistence via IAM backdoors

Attackers establish persistence by creating new IAM users, adding access keys to existing users, attaching permissive inline policies, or creating OIDC identity provider trust relationships to attacker-controlled external IdPs. Look for CreateUser, CreateAccessKey, PutUserPolicy, and CreateOpenIDConnectProvider events in CloudTrail following initial compromise.

Lambda and serverless abuse

Attackers with Lambda create/update permissions can deploy malicious functions that execute in the cloud environment with the Lambda execution role's permissions. Lambda functions can establish persistent network access, exfiltrate data, and pivot to other services. Monitor UpdateFunctionCode, CreateFunction, and AddPermission events.

Cloud cryptomining

One of the most common cloud attack outcomes: compromised credentials used to launch GPU instances for cryptocurrency mining. Indicators: sudden instance launches in regions not normally used, EC2 instance type changes to GPU/compute-optimized families, GuardDuty CryptoCurrency findings, unexpected compute cost spikes.

Evidence Collection in Cloud Environments

Evidence collection must happen before ephemeral resources are terminated. Cloud IR requires pre-planned collection procedures, not ad-hoc responses.

Instance memory capture

Cloud instances cannot be physically seized. For volatile memory collection from running instances, use SSM Run Command (AWS) or Azure Run Command to execute memory acquisition tools (Volatility, LiME) remotely and ship output to a forensic bucket. Must be done before the instance is terminated or reimaged.

EBS/disk snapshot

Create an EBS snapshot (AWS) or managed disk snapshot (Azure) of the compromised instance's storage before termination. Snapshots preserve the disk state for offline forensic analysis. Tag snapshots with the incident ID and preserve them in a forensic account isolated from the compromised environment.

Log preservation

Ship all relevant logs to an immutable forensic account before they expire. CloudTrail logs in S3 with Object Lock enabled cannot be deleted. For active incidents, ensure logs are not in the compromised account where the attacker may delete them — ship to a separate logging account with restricted access.

Network capture

VPC Traffic Mirroring (AWS) can capture raw network traffic from EC2 instances to a forensic listener in real time. Azure Network Watcher provides packet capture capability. Enable traffic mirroring immediately upon incident detection for active compromises.

IAM credential revocation

Immediately revoke compromised IAM credentials (delete access keys, invalidate session tokens). For AWS: call iam:DeleteAccessKey and attach an explicit deny policy to the compromised principal. For Azure: revoke all refresh tokens for the compromised account. Revoke before containment to prevent additional unauthorized actions during the investigation window.

Cloud IR Tooling

Specialized tooling accelerates cloud investigation and reduces time-to-contain.

AWS — CloudTrail Lake and Athena

CloudTrail Lake provides SQL-queryable audit log analysis. Athena queries S3-stored CloudTrail logs using SQL. Both enable rapid investigation of specific API call patterns across long retention windows without requiring a separate SIEM for initial triage.

Stratus Red Team (offensive) / CloudSploit (defensive)

Stratus Red Team simulates cloud attack techniques for detection testing. CloudSploit (Aqua) scans cloud environments for misconfigurations. Both are valuable for understanding what cloud attacks look like in logs before responding to a real incident.

Pacu — AWS exploitation framework

Pacu is an open source AWS exploitation framework used by red teams and IR teams to understand attack paths, enumerate permissions from compromised credentials, and test privilege escalation paths. Understanding Pacu's techniques helps IR analysts recognize them in CloudTrail logs.

Cloud Custodian

Cloud Custodian is an open source cloud governance tool that can automate incident response actions: isolate EC2 instances, revoke IAM permissions, quarantine S3 buckets. Pre-built Cloud Custodian policies for common IR actions reduce response time.

Containment in Cloud Environments

Cloud containment must balance stopping attacker access against preserving evidence and maintaining business operations.

IAM credential containment

Primary containment action: revoke compromised credentials. AWS: delete access keys, attach explicit deny SCP or inline policy. Azure: revoke sessions, disable account, reset credentials. GCP: disable service account, revoke all tokens. This stops the attacker's active access without destroying evidence.

Network isolation

Apply restrictive security groups (AWS) or NSGs (Azure) to compromised instances to block all inbound and outbound traffic except forensic investigation access. Create an isolation security group that permits only IR analyst access and attach it to compromised instances.

Account quarantine

For severe compromises, use AWS Service Control Policies or Azure Management Group policies to restrict what can be done within the compromised account. This prevents lateral movement to other accounts within the organization while investigation continues.

The bottom line

Cloud forensics and IR success depends on preparation before an incident: log coverage must be configured, evidence collection runbooks must exist, and forensic isolation infrastructure must be pre-built. The worst time to discover that S3 data events were not enabled is mid-investigation of an S3 exfiltration incident. Organizations that invest in cloud IR readiness — log coverage audits, pre-built forensic accounts, and practiced containment procedures — detect faster and contain faster when incidents occur.

Frequently asked questions

What logs are most important for AWS incident response?

For most AWS IR investigations: CloudTrail management events (default-on — all control plane API calls), CloudTrail data events for S3 and Lambda (opt-in — required for data exfiltration investigations), VPC Flow Logs (network metadata), GuardDuty findings (anomaly detection), and IAM Access Analyzer findings (unintended external access). Enable S3 data events and VPC Flow Logs proactively — waiting until an incident to enable them means losing historical evidence.

How do you collect forensic evidence from an EC2 instance before it is terminated?

Three main techniques: (1) EBS snapshot — create a snapshot of the instance's root volume before termination; this preserves disk state for offline forensic analysis; (2) SSM Run Command — execute memory acquisition tools (LiME) remotely via Systems Manager and stream output to an S3 forensic bucket; (3) VPC Traffic Mirroring — capture live network traffic from the instance to a forensic listener. All three require pre-planning; they cannot be ad-hoc configured mid-incident without losing time.

What is the most common cloud attack vector?

Compromised IAM credentials are the most common initial access vector in cloud incidents (approximately 40% of cloud breaches per CrowdStrike data). Credentials are stolen via phishing, exposed in code repositories (hardcoded keys in GitHub), exfiltrated from EC2 instance metadata via SSRF vulnerabilities, or accessed from misconfigured S3 buckets containing credential files. Identity and access management misconfiguration (overly permissive roles, no MFA on IAM users) is the enabling vulnerability.

How is cloud lateral movement different from on-premises lateral movement?

On-premises lateral movement typically uses credential theft + network protocols (SMB, WMI, PSExec) to move between systems. Cloud lateral movement uses IAM role assumptions and API calls to move between services and accounts. An attacker who compromises an EC2 instance with an overprivileged instance role uses sts:AssumeRole to access other AWS accounts, Lambda functions, and S3 buckets — all through API calls that look like normal cloud management activity. Detection requires monitoring IAM API call patterns, not network lateral movement indicators.

What is the AWS shared responsibility model and how does it affect incident response?

AWS's shared responsibility model means AWS secures the underlying infrastructure; customers secure what they build on it. For IR, this means: AWS collects some logs (CloudTrail management events) by default, but customers must opt in to others (S3 data events, VPC Flow Logs). AWS cannot provide disk images or memory from terminated instances — customers must capture this themselves before instance termination. AWS Security can provide additional logging and assistance for incidents affecting AWS infrastructure (rare), but customer-account incidents are the customer's responsibility to investigate.

How long should cloud logs be retained for incident response?

CloudTrail logs should be retained for at least 12 months, with the most recent 90 days in hot storage (CloudTrail console or Log Analytics) for rapid investigation. 12 months covers most dwell times (average cloud dwell time is 93 days) and satisfies most compliance frameworks. Enable S3 Object Lock on CloudTrail log buckets to prevent attackers from deleting evidence. For M365 and Azure, Entra ID sign-in logs default to 30 days — ship to Log Analytics immediately or extend retention with Purview.

Sources & references

Free resources

Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Author

Eric BangCISSP

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

View profile →LinkedIn

Back to all briefings

Subscribe for Updates

cloud forensics cloud incident response AWS incident response Azure forensics GCP security cloud IR cloud attack investigation CloudTrail forensics cloud evidence collection digital forensics cloud

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.

Cloud Forensics and Incident Response: A Practitioner Guide

Cloud vs. On-Premises IR: Key Differences

Ephemeral compute

Identity is the perimeter

Shared responsibility for logs

API-first attack surface

Multi-cloud lateral movement

Critical Log Sources by Cloud Provider

AWS — CloudTrail

AWS — VPC Flow Logs

AWS — GuardDuty findings

Azure — Activity Log

Azure — Entra ID Sign-in Logs

Azure — Unified Audit Log (M365)

GCP — Cloud Audit Logs

Cloud-Specific Attack Patterns to Investigate

Credential compromise via IMDSv1

Privilege escalation via IAM misconfigurations

S3 data exfiltration

Persistence via IAM backdoors

Lambda and serverless abuse

Cloud cryptomining

Evidence Collection in Cloud Environments

Instance memory capture

EBS/disk snapshot

Log preservation

Network capture

IAM credential revocation

Cloud IR Tooling

AWS — CloudTrail Lake and Athena

Stratus Red Team (offensive) / CloudSploit (defensive)

Pacu — AWS exploitation framework

Cloud Custodian

Containment in Cloud Environments

IAM credential containment

Network isolation

Account quarantine

The bottom line

Frequently asked questions

What logs are most important for AWS incident response?

How do you collect forensic evidence from an EC2 instance before it is terminated?

What is the most common cloud attack vector?

How is cloud lateral movement different from on-premises lateral movement?

What is the AWS shared responsibility model and how does it affect incident response?

How long should cloud logs be retained for incident response?

Sources & references

Critical CVE Reference Card 2025–2026

Ransomware Incident Response Playbook

Get threat intel before your inbox does.

Get briefings like this every morning