Splunk SPL Cheat Sheet: Essential Security Queries for SOC Analysts
Splunk's dominance in enterprise security operations is built on SPL. Every investigation, detection, and dashboard in Splunk Enterprise Security is ultimately a search written in SPL, whether authored by an analyst, generated by a data model acceleration, or baked into a Technology Add-on's saved search. The analysts who are most effective in Splunk are not the ones who know the most SPL commands but the ones who have a library of proven patterns they can adapt quickly to new investigation scenarios.
This guide is organized the way a working SOC analyst thinks: start with the foundational commands, then work through the investigation scenarios that come up most frequently. Each query includes annotation explaining why specific choices were made, not just what the commands do.
SPL Fundamentals: The Pipeline Model and Search Modes
Every SPL search is a pipeline. Raw events enter from the left and flow through a series of commands separated by pipe characters (|). Each command receives the result set from the previous command, transforms it, and passes it to the next command. The final command's output is what appears in the search results.
The simplest useful security search:
index=windows sourcetype=WinEventLog EventCode=4625
| stats count by Account, src_ip
| sort -count
| head 20
This finds the 20 source IPs and accounts with the most failed Windows logons in the current time range. The stats command aggregates (count of events per Account+src_ip combination), sort -count orders by count descending, and head 20 returns only the top 20 rows.
Search modes affect performance:
Splunk's search modes control how much event data is processed:
-
Fast mode: Splunk extracts only the fields required for the search commands. No field discovery, no event decoration. Best for high-performance searches where you know exactly which fields you need. Automatically applied when all fields referenced in the search are indexed fields.
-
Smart mode (default): Splunk uses fast mode for transforming searches (searches that produce statistical tables) and verbose mode for non-transforming searches. The default for most analyst searches.
-
Verbose mode: Full field extraction, event decoration, and timeline generation for every event. Best for investigation searches where you are exploring event content without a specific field in mind. Most expensive mode.
For production scheduled searches and correlation rules, always use fast mode explicitly by setting the dispatch.search_mode parameter to "fast" in the savedsearches.conf configuration.
Time range best practice:
Always specify an explicit time range rather than relying on the UI time picker. Scheduled searches use earliest and latest parameters: earliest=-60m@m latest=now. For scheduled correlation searches that run every 5 minutes with a 1-hour lookback, use earliest=-65m@m latest=-5m@m (with a 5-minute overlap buffer to avoid missing events due to indexing latency). Event indexing latency in Splunk is typically under 30 seconds for Universal Forwarder-delivered events but can reach several minutes for high-volume sources during ingestion spikes.
Using subsearches for dynamic filtering:
Subsearches return results from one search to be used as filter values in another:
index=proxy dest_category=malware
| return 20 dest
| format
This subsearch pattern is used to dynamically build filter lists (for example, pull the top 20 destination IPs from a threat intelligence lookup and use them as a filter in a second search over proxy logs). Subsearches have a result limit (10,000 rows by default) and should not be used as a substitute for proper joins or lookups in high-volume searches.
Core Commands Every Security Analyst Must Know
The following command reference covers the SPL operations that appear most frequently in security investigation and detection work.
stats: Aggregates events into a statistical summary. The security analyst's most used command.
| stats count as EventCount,
dc(src_ip) as UniqueSourceIPs,
values(EventCode) as EventCodes,
earliest(_time) as FirstSeen,
latest(_time) as LastSeen
by Account
dc() computes a distinct count. values() collects all unique values into a multi-value field. earliest() and latest() return the minimum and maximum time values.
eval: Computes new fields using expressions. Supports conditional logic, string manipulation, and mathematical operations.
| eval
IsAdmin = if(match(Account, "admin|svc_|SYSTEM"), "true", "false"),
Duration = strptime(EndTime, "%Y-%m-%d %H:%M:%S") - strptime(StartTime, "%Y-%m-%d %H:%M:%S"),
Category = case(
count > 100, "High Volume",
count > 20, "Medium Volume",
true(), "Low Volume"
)
rex: Extracts fields from unstructured strings using named capture groups (Python regex syntax).
| rex field=_raw "src=(?<SrcIP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+dst=(?<DstIP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
The named capture group (?<FieldName>pattern) creates a new field in the event. rex also supports extraction using a field mode: rex field=CommandLine mode=sed "s/temp/REDACTED/g" applies a substitution to the CommandLine field.
transaction: Groups related events into logical transactions based on common field values, with optional time constraints.
index=firewall action=* src_ip=192.168.1.100
| transaction src_ip maxspan=1h maxpause=5m
| eval Duration = duration
| where eventcount > 10
lookup: Enriches events with data from a lookup table (CSV file, KV Store, or external lookup).
| lookup dnslookup clientip as src_ip OUTPUT clienthost
| lookup threat_intel_ips ip as src_ip OUTPUT threat_category, threat_confidence
| where isnotnull(threat_category)
inputlookup: Reads a lookup table as a data source (as opposed to enriching existing events).
| inputlookup threat_intel_ips.csv
| where threat_confidence > 80
| map search="index=proxy dest_ip=$ip$"
Briefings like this, every morning before 9am.
Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.
Authentication and Identity Investigation Queries
Authentication telemetry (Windows Security Event logs, Okta logs, Azure AD sign-in logs) provides the most reliable signal for detecting credential-based attacks. The following queries cover the most common investigation scenarios.
Brute force detection: failed logins per account per source IP:
index=windows sourcetype=WinEventLog EventCode=4625
earliest=-1h
| stats count as FailedCount,
values(Source_Network_Address) as SourceIPs,
dc(Source_Network_Address) as UniqueSourceIPs
by Account
| where FailedCount > 20
| sort -FailedCount
Credential stuffing: failed logins followed by success from same source:
index=windows sourcetype=WinEventLog (EventCode=4624 OR EventCode=4625)
earliest=-2h
| eval Action = if(EventCode==4624, "Success", "Failure")
| stats values(Action) as Actions,
count(eval(Action="Failure")) as Failures,
count(eval(Action="Success")) as Successes
by Account, Source_Network_Address
| where Failures > 5 AND Successes > 0
Privileged group membership changes (Windows Security Event):
index=windows sourcetype=WinEventLog EventCode IN (4728, 4732, 4756)
| rex field=_raw "Member:\s+Security ID:\s+\S+\s+Account Name:\s+(?<MemberName>[^\n]+)"
| rex field=_raw "Group:\s+Security ID:\s+\S+\s+Group Name:\s+(?<GroupName>[^\n]+)"
| eval PrivilegedGroup = if(match(GroupName, "Domain Admins|Enterprise Admins|Schema Admins|Administrators|Backup Operators"), "true", "false")
| where PrivilegedGroup="true"
| table _time, SubjectUserName, MemberName, GroupName, ComputerName
Account lockout investigation:
index=windows sourcetype=WinEventLog EventCode=4740
earliest=-24h
| stats count as LockoutCount,
values(Source_Workstation) as SourceWorkstations,
values(Caller_Computer_Name) as CallerComputers
by TargetUserName
| where LockoutCount > 2
| sort -LockoutCount
For each locked-out account with multiple lockouts, the Source_Workstation field indicates which machine is generating the bad authentication requests, pointing the investigation toward the specific endpoint that has stored or is using the wrong credentials.
Network Investigation Queries
Network telemetry (proxy logs, DNS logs, firewall logs, NetFlow) provides visibility into C2 communication, data exfiltration, lateral movement, and scanning activity.
DNS exfiltration detection: high subdomain volume per domain:
index=dns sourcetype=dns_logs record_type=A OR record_type=TXT
earliest=-1h
| rex field=query "(?<subdomain>[^\.]+)\.(?<domain>[^\.]+\.[^\.]+)$"
| stats dc(subdomain) as UniqueSubdomains, count as QueryCount
by src_ip, domain
| where UniqueSubdomains > 50
| lookup alexa_top1m domain OUTPUT rank
| where isnull(rank) ` Exclude Alexa top 1M domains
| sort -UniqueSubdomains
High unique subdomain counts for non-Alexa-listed domains are a strong signal for DNS tunneling (Iodine, dnscat2) or DNS-based data exfiltration using dynamically generated subdomains to encode data.
Beaconing detection: regular outbound connection intervals:
index=proxy OR index=netflow sourcetype=bluecoat OR sourcetype=palo_alto_traffic
earliest=-24h
| stats count as ConnectionCount,
avg(eval(tonumber(bytes_out))) as AvgBytesOut,
stdev(eval(tonumber(_time))) as TimeStdev
by src_ip, dest_ip, dest_port
| where ConnectionCount > 10
| where TimeStdev < 300 ` Low standard deviation = regular intervals (beaconing)
| where AvgBytesOut < 5000 ` Small consistent payloads typical of beacons
| sort TimeStdev
Large data transfer detection (potential exfiltration):
index=proxy sourcetype=bluecoat action=TCP_TUNNEL OR action=TCP_MISS
earliest=-24h
| eval BytesMB = bytes_out / 1048576
| stats sum(BytesMB) as TotalMBOut, count as RequestCount
by src_ip, dest_host
| where TotalMBOut > 500 ` Flag transfers over 500 MB
| lookup alexa_top1m domain as dest_host OUTPUT rank
| where isnull(rank) ` Exclude known CDNs and legitimate cloud services
| sort -TotalMBOut
Port scanning detection (firewall deny logs):
index=firewall sourcetype=palo_alto_traffic action=deny
earliest=-15m
| stats dc(dest_port) as UniquePortsProbed, count as DenyCount
by src_ip, dest_ip
| where UniquePortsProbed > 20
| sort -UniquePortsProbed
Endpoint Detection Queries
Endpoint telemetry from Windows Event logs, Sysmon, or EDR platforms provides visibility into process execution, persistence mechanisms, and lateral movement techniques.
Suspicious PowerShell execution (encoded commands):
index=windows sourcetype=WinEventLog EventCode=4688 OR (sourcetype=xmlwineventlog EventCode=4688)
| where match(Process_Command_Line, "-enc\\s+|EncodedCommand|FromBase64String|Invoke-Expression|IEX")
| rex field=Process_Command_Line "-enc\\s+(?<B64Payload>[A-Za-z0-9\+/=]+)"
| eval DecodedPayload = if(isnotnull(B64Payload), replace(urldecode(B64Payload),"%2B","+"), null)
| table _time, ComputerName, Creator_Process_Name, Process_Command_Line, B64Payload
| sort -_time
Persistence: new scheduled task creation:
index=windows sourcetype=WinEventLog EventCode=4698
| rex field=_raw "Task Name:\s+(?<TaskName>[^\r\n]+)"
| rex field=_raw "Task Content:\s+(?<TaskContent>[\s\S]+?)</Task>"
| where NOT match(TaskName, "^\\\\Microsoft\\\\|^\\\\Windows\\\\")
| stats count by _time, SubjectUserName, ComputerName, TaskName
| sort -_time
Lateral movement via PsExec:
index=windows sourcetype=WinEventLog EventCode=7045
| where match(ServiceFileName, "\\\\ADMIN\$\\\\|PSEXESVC|paexec")
| stats count by _time, ComputerName, ServiceName, ServiceFileName, AccountName
| sort -_time
EventCode 7045 (System) logs new service installations. PsExec installs a temporary service (PSEXESVC) on the target host, which is logged here.
Living-off-the-land binary (LOLBin) abuse:
index=windows sourcetype=WinEventLog EventCode=4688
| where match(New_Process_Name, "(?i)certutil\.exe|bitsadmin\.exe|mshta\.exe|regsvr32\.exe|rundll32\.exe|wscript\.exe|cscript\.exe")
| where match(Process_Command_Line, "(?i)http|ftp|download|decode|urlcache")
| table _time, ComputerName, Creator_Process_Name, New_Process_Name, Process_Command_Line
| sort -_time
This query looks for Windows built-in tools being used to download content from the internet or decode data, a technique used extensively in fileless malware and LOLBin-based attacks (T1218 in MITRE ATT&CK).
Writing Splunk Enterprise Security Correlation Searches and Risk-Based Alerting Rules
Splunk Enterprise Security (ES) is built on top of the core Splunk platform and adds the correlation search framework, risk-based alerting, the threat intelligence framework, and the investigation workflow. Writing effective ES correlation searches requires understanding CIM data models and the RBA architecture.
CIM-based correlation searches using tstats:
The tstats command queries pre-computed data model accelerations, which are dramatically faster than raw index searches for large time windows:
| tstats summariesonly=true
count as EventCount,
dc(Authentication.src) as UniqueSources
from datamodel=Authentication.Authentication
where Authentication.action=failure
by Authentication.user, _time span=1h
| rename Authentication.user as user
| where EventCount > 30 AND UniqueSources > 1
The summariesonly=true parameter restricts the search to the accelerated summary data only, returning results in seconds rather than minutes for large authentication datasets.
Converting a correlation search to a risk-based alerting rule:
Instead of generating a notable event (ES incident) directly, a risk-based correlation search writes a risk event to the Risk Index:
| tstats summariesonly=true count from datamodel=Authentication.Authentication
where Authentication.action=failure by Authentication.user, Authentication.src
| rename Authentication.* as *
| where count > 20
| eval risk_score = case(
count > 100, 75,
count > 50, 50,
count > 20, 25
),
risk_object = user,
risk_object_type = "user",
risk_message = "Brute force detected: " . count . " failures from " . src
| table risk_object, risk_object_type, risk_score, risk_message, src, count
This search outputs a risk score per user. The Risk Notable correlation search (pre-built in ES) monitors the Risk Index and creates an ES Notable Event when a risk object (user, system, or IP) accumulates more than the configured threshold (commonly 100 points) within a rolling time window.
Configuring risk object types:
ES risk objects are categorized as User, System, or Other. Correctly categorizing the risk object type ensures that related risk events for the same entity aggregate together in the Risk Analysis dashboard. Use risk_object_type = "user" for account-based events, "system" for computer-based events, and "other" for IP addresses or domains that do not map cleanly to either category.
SPL Performance Optimization
Search performance in Splunk is determined primarily by how early in the pipeline the search engine can reduce the number of events being processed. The following optimization patterns apply to both ad-hoc investigation searches and scheduled correlation searches.
Rule 1: Maximize the efficiency of the base search.
The base search (everything before the first pipe) is evaluated by the indexer layer using the inverted index. Searches that specify index, sourcetype, and field-value pairs with exact equality (EventCode=4625) use the index efficiently. Searches that use wildcards in the middle or end of values (EventCode=46*) are less efficient. Searches that use NOT with wildcards (NOT sourcetype=*windows*) force a full scan.
Rule 2: Use tstats for aggregation over large time windows.
For searches aggregating over more than 4 hours of data across high-volume sourcetypes, tstats with data model acceleration is 10 to 100 times faster than stats over raw events, assuming the relevant fields are in the CIM data model.
Rule 3: Reduce event count before expensive operations.
Joins (join), transactions (transaction), and lookups with many rows are expensive. Apply stats or where filters before these operations to minimize the row count passed to the expensive command.
Rule 4: Use indexed fields in base searches where possible.
Splunk indexes specific fields (identified in transforms.conf with indexed=true) in the inverted index alongside the raw event. Searches on indexed fields (typically source, sourcetype, host, index, and any fields explicitly configured as indexed in the TA) are faster than searches on extracted fields which require reading raw events.
Rule 5: Schedule searches at appropriate intervals.
Scheduled searches that run more frequently than necessary consume scheduler resources and may delay other scheduled searches. A correlation search for monthly trend analysis does not need to run every 5 minutes. Match the search interval to the detection latency requirement and the expected frequency of the targeted behavior.
Using the Job Inspector for performance diagnosis:
After running any search, click the magnifying glass icon in the search bar to open the Job Inspector. The inspector shows: total execution time, search command duration breakdown (how long each pipeline command took), events scanned vs events returned at each stage, and cache hit rate. Use this information to identify which pipeline command is the bottleneck and apply the appropriate optimization.
The bottom line
SPL mastery comes from building and maintaining a personal library of proven search patterns. The commands covered here, stats, eval, rex, transaction, lookup, tstats, and their combinations, cover the majority of real security investigation and detection scenarios. The discipline that separates operationally effective SPL from slow, noisy searches is the same discipline that defines good detection engineering: filter early, filter specifically, aggregate before joining, and validate results against real data before deploying to production. Invest the time in understanding CIM data models and tstats if you run Splunk Enterprise Security; the performance improvement is not marginal but transformational.
Frequently asked questions
What are the most important SPL commands for a SOC analyst to learn first?
The six commands that cover the majority of SOC investigation and detection work are: `stats` (aggregate and count events by field values, the foundation of detection queries), `eval` (compute new fields and boolean expressions for filtering and enrichment), `rex` (extract fields from unstructured text using named capture groups), `transaction` (group related events into sessions by common field values), `lookup` (enrich events with data from external CSV or KV store tables), and `where` (filter events using eval-style boolean expressions, as distinct from the `search` command's keyword syntax). Mastering these six before learning the full SPL command set gives analysts the tools to handle 90 percent of real investigation scenarios. The `table` command for selecting output columns and `sort` for ordering results complete the essential set.
How do I search across multiple indexes in Splunk?
Use the `index` keyword with OR logic or wildcard syntax. To search multiple specific indexes: `index=windows OR index=linux sourcetype=syslog`. To search all indexes matching a pattern: `index=security_*` (wildcard). To search all accessible indexes simultaneously: omit the index filter entirely, or use `index=*` (be cautious: this is expensive and slow in large deployments). In Splunk Enterprise Security, the CIM-compliant data models abstract index differences: searching `datamodel=Authentication` queries all authentication events regardless of which index they were indexed to, provided the sourcetypes are mapped to the CIM Authentication data model via their respective Technology Add-ons.
What is the difference between stats and eventstats in SPL?
The `stats` command aggregates events into a summary result set: you get one output row per unique value of the `by` clause fields, with all other fields dropped. The original events are replaced by the aggregated result. `eventstats` computes the same aggregation but adds the result as new fields on each original event row, preserving all original events in the result set. Use `stats` when you want a summary table (how many failed logins per account?). Use `eventstats` when you want to enrich raw events with a computed metric (add each event's account-level total login count as a new field, then filter to accounts with totals exceeding a threshold). A related command is `streamstats`, which computes running aggregations in time order: `streamstats count by src_ip` adds a running count of events from each source IP as events flow through the pipeline.
How do I write a Splunk correlation search for brute force detection?
A Splunk Enterprise Security correlation search for brute force detection combines a `stats` aggregation over the Authentication data model with threshold filtering. The search runs against the CIM Authentication data model's Failed authentication events: `| tstats summariesonly=true count from datamodel=Authentication.Authentication where Authentication.action=failure by Authentication.src, Authentication.user, _time span=1h | rename Authentication.* as * | where count > 20`. This counts failed logins per source IP and username per hour and alerts when the count exceeds 20. Using `tstats` with `summariesonly=true` queries the accelerated data model summaries rather than raw events, which is 10 to 100 times faster for time-windowed aggregations over large event volumes. The correlation search is saved with a cron schedule (every 5 minutes for brute force) and triggers a risk event via the `makeresults` + `risk` macro rather than a direct alert, in risk-based alerting deployments.
What is risk-based alerting in Splunk Enterprise Security and how do I implement it?
Risk-based alerting (RBA) is a detection framework in Splunk Enterprise Security that accumulates risk scores on entities (users, systems, IP addresses) based on individual low-confidence signals, rather than generating a direct alert on every individual signal match. Each correlation search, when it fires, calls the `risk` macro to add a risk score to an entity in the Risk Index. A separate "Risk Notables" rule monitors the Risk Index and generates an incident only when an entity's cumulative risk score crosses a threshold (commonly 100 points) within a time window. Implementation: convert existing correlation searches from direct alert-generating rules to risk-scoring rules by replacing `notable` action with `risk` action and assigning a risk score based on the confidence of each individual signal. Define entity types in ES's Risk Analysis configuration. Set the Risk Notables correlation search threshold to match your alert volume tolerance. RBA reduces alert volume by 60-80% because isolated low-confidence signals no longer generate individual incidents, while entities with multiple concurrent signals accumulate enough risk to surface.
How do I optimize a slow SPL search query?
The primary optimization technique is moving the most restrictive filters to the earliest position in the search pipeline. The `index` and `sourcetype` filters must always come first because Splunk uses them to determine which buckets to search before reading any events. After index/sourcetype, add time range narrowing, then keyword filters using `search` or `where` commands with field-qualified terms (use `field_name=value` rather than bare keyword searches which trigger a full-text scan). Avoid using `NOT` with wildcard patterns and avoid wildcards at the beginning of strings (`*value` prevents index use). For recurring searches that run on a schedule, consider building a summary index or data model acceleration that pre-computes the base aggregation. The Splunk Job Inspector (accessible from the search results page) shows exactly which phase of the search consumed the most time and how many events were processed at each pipeline stage.
How does SPL compare to KQL for Microsoft Sentinel?
Both SPL and KQL are pipe-based query languages designed for log data, and the conceptual model is similar enough that analysts who know one can learn the other in two to four weeks. Key differences: SPL's syntax is more keyword-oriented and verbose (`index=windows sourcetype=WinEventLog EventCode=4625 | stats count by Account, src_ip`) while KQL is more operator-oriented (`SecurityEvent | where EventID == 4625 | summarize count() by TargetAccount, IpAddress`). SPL's `stats` is equivalent to KQL's `summarize`. SPL's `eval` is equivalent to KQL's `extend`. SPL's `rex` for field extraction is equivalent to KQL's `extract` or `parse`. SPL has more mature data model abstraction (CIM) that normalizes across sourcetypes; KQL relies on consistent field naming within each table. For analysts who work in both platforms, the mental model is the same; only the syntax differs.
Sources & references
Free resources
Critical CVE Reference Card 2025–2026
25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.
Ransomware Incident Response Playbook
Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.
Get threat intel before your inbox does.
50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.
Unsubscribe anytime. We never sell your data.

Founder & Cybersecurity Evangelist, Decryption Digest
Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.
The Mythos Brief is free.
AI that finds 27-year-old zero-days. What it means for your security program.
