Active ThreatSeptember 10, 202611 min read

Security Log Management Best Practices for Enterprise Teams

Sources:NIST SP 800-92 Guide to Computer Security Log Management|CISA Logging Made Easy|NSA Cybersecurity Information: Detecting Abuse of Authentication Mechanisms

Eric Bang

Founder & Cybersecurity Evangelist

212 days

Average time to identify a breach without centralized logging

76%

Of breaches are discovered through log analysis or external notification

90 days

Minimum hot log retention recommended by CISA

30%

Of organizations have no visibility into authentication events on cloud services

Security log management sounds operational rather than strategic, but the quality of your log pipeline determines whether your detection program works. A SIEM with poor-quality log inputs produces noisy, incomplete detections. A threat hunt with missing log sources finds nothing. A breach investigation with gaps in authentication logs cannot establish attacker timeline.

This guide covers the security-specific log management decisions that operations guides overlook: which event IDs actually matter for detection, how to architect collection for high-volume environments, what retention architecture supports both operational detection and retrospective investigation, and how log quality determines the ceiling of your entire detection program.

The Security Log Priority Hierarchy

Not all log sources have equal detection value. Security teams often collect everything available without prioritizing the sources that matter most, resulting in high storage costs and low detection value per GB ingested.

The highest-value security log sources, in priority order: Authentication events (Windows Security Event IDs 4624, 4625, 4648, 4768, 4769, 4771, 4776) — authentication logs are the single most valuable source for detecting credential abuse, lateral movement, and brute force. Endpoint process execution logs (Windows Sysmon Event ID 1, macOS Unified Log process creation, Linux auditd execve) — process telemetry is required to detect malware execution, living-off-the-land techniques, and post-exploitation activity. DNS query logs — DNS is used in nearly every malware family for C2 communication; query logs enable detection of DNS tunneling, beaconing, and DGA domains. Network flow data (NetFlow or IPFIX) — flow data enables lateral movement detection and exfiltration baselining without the storage cost of full packet capture. Cloud audit logs (AWS CloudTrail, Azure Monitor, GCP Cloud Audit) — cloud API calls include every privileged action in your cloud environment; missing these logs means blind spots in your largest attack surface.

Lower-priority sources that are collected in many environments but contribute less per GB: DHCP leases (useful for IP-to-hostname mapping but low direct detection value), web proxy access logs (valuable for exfiltration detection but very high volume), and print spooler logs (extremely low detection value except during PrintNightmare-class exploits).

Log Collection Architecture for Enterprise Environments

Enterprise log collection has two common failure modes: collecting too little (missing critical sources entirely) and collecting too much (drowning the SIEM in low-value events at unsustainable cost). The right architecture routes logs based on detection value.

Tier 1 — SIEM hot storage: Authentication events, process execution logs, DNS queries from critical systems, cloud audit logs, and firewall deny logs. These are indexed for real-time querying and alert generation. Retention: 90 days minimum, 12 months for regulated environments.

Tier 2 — Cold storage or data lake: Full endpoint telemetry, web proxy logs, DHCP logs, and application verbose logs. These are stored at low cost (S3, Azure Data Lake) and queried only on demand during investigations or threat hunts. Retention: 12-24 months.

Tier 3 — Discard: Logs with no security value — successful health check HTTP 200 responses, routine backup job completions, scheduled task completion logs for known-good automation. Define and document what is explicitly excluded so audit questions can be answered.

For collection infrastructure, choose between agent-based collection (Elastic Agent, Splunk Universal Forwarder, Cribl Edge) and agentless collection (syslog-ng, rsyslog, network device log forwarding via UDP/TCP syslog). Use agents for endpoints where you need high-fidelity event data; use agentless for network devices and systems where agent installation is impractical. Centralize log aggregation at a pipeline layer before the SIEM (Cribl Stream, Logstash, Vector) to normalize, enrich, and filter before indexing — this is where you implement your tiering logic without creating SIEM-specific configuration.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Log Normalization and Quality Standards

Raw logs from different sources use different field names, timestamp formats, and severity scales for the same information. A failed authentication event in Windows Security Event ID 4625 looks completely different from the same event in Linux PAM or Okta audit logs — different field names, different timestamp precision, different reason codes.

Log normalization standardizes fields across sources so detections and queries work across the entire environment without source-specific logic. Common normalization schemas: OCSF (Open Cybersecurity Schema Framework — AWS, Splunk, Palo Alto backed), ECS (Elastic Common Schema — Elastic Stack), CIM (Splunk Common Information Model). Pick one schema and apply it consistently — cross-schema mapping is more work than standardizing upfront.

Critical quality checks for log pipelines: timestamp accuracy (all events should use UTC; clock drift on source systems corrupts event correlation), completeness (spot-check that all expected events are arriving — use a test authentication failure and verify it appears within 60 seconds), field fidelity (verify that fields mapped in normalization actually contain expected values, not nulls or truncated strings).

For Windows environments, ensure audit policy is configured to generate the events you need. Default Windows audit settings do not enable many high-value event categories. Use the Microsoft Security Compliance Toolkit or CIS Windows Benchmark to configure audit policy, or deploy Sysmon with a community configuration (Olaf Hartong's sysmon-modular or SwiftOnSecurity's config) to supplement Windows native logging.

Using Log Data for Threat Hunting

A mature log management program supports proactive threat hunting, not just reactive alert investigation. Threat hunting requires high-quality, long-retention log data queried through a hypothesis-driven analytical process.

The most productive threat hunts start with high-value authentication and process execution logs. Example hunts that depend on log quality: detecting pass-the-hash attacks (requires Kerberos TGT requests with NTLM hash authentication — Event ID 4776); detecting beacon behavior (requires DNS query logs with high-frequency low-TTL queries to newly registered domains); detecting credential stuffing (requires authentication logs with source IP geolocation and failure-to-success ratio analysis).

Log retention directly enables or blocks certain hunt types. Hunts for APT-style intrusions often require 6-12 months of historical data — attackers with long dwell times establish persistence months before active exfiltration begins, and retrospective analysis against newly published IOCs is only possible if the historical logs exist.

Build a log catalog: a documented list of every log source, its field mapping in your normalization schema, its current retention period, and the detection rules that depend on it. The catalog makes threat hunts faster (analysts know where to look), makes coverage gaps visible (missing sources appear as blank cells), and supports compliance documentation (auditors can see what you collect and retain).

Prioritize authentication and process execution logs first

These two sources enable more detections per GB than any other log type.

Implement a three-tier log routing architecture

Route by detection value: SIEM hot, cold storage, and explicit discard.

Standardize on one normalization schema before ingesting anything

OCSF, ECS, or Splunk CIM — pick one and map all sources consistently.

Configure Windows audit policy beyond defaults

Deploy Sysmon or the Microsoft Security Compliance Toolkit audit baselines — default Windows logging misses most high-value events.

Build a log catalog with field mappings and retention periods

A documented catalog makes threat hunts faster and compliance evidence collection straightforward.

Subscribe to unlock Remediation & Mitigation steps

Free subscribers unlock full IOC lists, remediation steps, and every daily briefing.

The bottom line

Log management quality determines the ceiling of your entire detection and response program. Prioritize authentication events, process execution telemetry, DNS logs, and cloud audit logs at 90-day hot retention before worrying about anything else. Normalize at the pipeline layer before the SIEM, implement tiered storage to control costs, and configure Windows audit policy beyond defaults. A small set of high-quality, well-normalized log sources enables better detection than a large volume of poorly structured, partially collected data.

Frequently asked questions

How long should I retain security logs?

NIST SP 800-92 recommends a minimum of 90 days of accessible log retention. CISA recommends 12 months for most federal environments. PCI DSS requires 12 months of audit log retention with 3 months immediately available for analysis. HIPAA requires 6 years of audit log retention as part of the audit control standard. For threat hunting programs that need to detect APT-style intrusions with long dwell times, 24 months of retention enables retrospective analysis against newly published IOCs. Use tiered storage: expensive SIEM hot storage for 90 days, cheap object storage for 12-24 months.

What Windows Event IDs should every SOC be collecting?

Minimum required Windows Security events: 4624 (successful logon), 4625 (failed logon), 4648 (logon with explicit credentials), 4768/4769 (Kerberos ticket requests), 4771 (Kerberos pre-auth failed), 4776 (NTLM authentication), 4720/4722/4724/4725/4728/4732/4756 (account and group management), 4688 with command-line logging enabled (process creation), 4698/4702 (scheduled task creation/modification), 7045 (new service installed). For advanced detection, deploy Sysmon and collect Event IDs 1 (process creation), 3 (network connection), 7 (image loaded), 10 (process access), 11 (file create), 12/13 (registry modification), 22 (DNS query).

What is the difference between syslog and SIEM?

Syslog is a protocol and format for transmitting log messages from devices to a collection server — it is the transport layer. A SIEM (Security Information and Event Management) is an application that collects, normalizes, correlates, and alerts on security events — it is the analytics layer. Most organizations use syslog (or agent-based forwarders) to collect logs from network devices, servers, and applications and forward them to a SIEM for analysis. The SIEM does detection; syslog does transport.

How do I handle log management for cloud-native workloads?

Cloud-native workloads — Lambda functions, containers, managed Kubernetes, serverless — have different log collection requirements than traditional servers. Enable native cloud audit logging immediately: AWS CloudTrail in all regions with a multi-region trail, Azure Diagnostic Settings for all subscriptions, GCP Cloud Audit Logs for Admin Activity and Data Access. For application logs from ephemeral containers, use a sidecar or node-level log collector (Fluentd, FluentBit) to forward to your central log pipeline before containers are terminated. Cloud-native applications that only log to stdout and have no persistent log collection will leave gaps in your security visibility.

What is Sysmon and should I deploy it?

Sysmon (System Monitor) is a free Windows system service from Microsoft Sysinternals that generates high-fidelity process creation, network connection, file creation, and registry modification events beyond what Windows native audit policy provides. It is the single most impactful change you can make to Windows endpoint logging quality. Deploy it with a community configuration (Olaf Hartong's sysmon-modular is the most comprehensive) rather than the default minimal configuration. The processing overhead is low (less than 2% CPU on modern hardware). Forward Sysmon events to your SIEM as Event Log source Microsoft-Windows-Sysmon/Operational.

Sources & references

Free resources

Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Author

Eric BangCISSP

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

View profile →LinkedIn

Back to all briefings

Subscribe for Updates

log management SIEM log retention security logging syslog Windows Event Log audit logging threat hunting SOC operations log aggregation

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.