SIEM Alert Tuning: How to Cut 10,000 Daily Alerts Down to 50 Actionable Ones
Alert fatigue is not a technology problem -- it's a prioritization problem. Every SIEM ships with hundreds of detection rules enabled by default, tuned for generic environments that don't match yours. The path from 10,000 alerts per day to 50 actionable ones is a source analysis to find your top noise generators, a tier-based routing model that separates 'page now' from 'review later,' and systematic allowlisting for known-good behavior. This guide walks each step with Splunk and Microsoft Sentinel query examples.
Step 1: Source Analysis -- Find Your Top 5 Noise Generators
Before touching any rule, know where your volume is coming from.
Splunk:
index=notable earliest=-30d
| stats count by source
| sort -count
| head 20
Microsoft Sentinel:
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize count() by AlertName, ProviderName
| order by count_ desc
| take 20
What to look for in the results:
- Any single rule generating >1,000 alerts/day: almost certainly needs tuning or suppression
- Antivirus/EDR detections from known-benign software being flagged repeatedly
- Network IDS rules firing on internal scanning tools or monitoring agents
- Authentication failure rules firing on service accounts with incorrect cached passwords
- Scheduled task or backup software triggering process creation rules
For each top-5 source, answer:
- What percentage of these alerts result in a confirmed incident? (check your closure reason data)
- Is there a common pattern in the false positives? (same source IP, same process, same time window)
- Can the rule be scoped more narrowly, or is the entire rule generating noise?
Step 2: Build a Tier-Based Alert Routing Model
Every alert being 'P1' means nothing is P1. Define tiers clearly and stick to them.
Tier definitions:
| Tier | Label | Response SLA | Criteria | Examples |
|---|---|---|---|---|
| P1 | Page now | Immediate (24/7) | High-confidence, high-impact; automated response would cause harm | Ransomware beacon, admin credential use from new country, data exfil to known C2 |
| P2 | Investigate today | Within 4 business hours | Medium-confidence or requires context to assess | Impossible travel, new service account created, first-time admin logon |
| P3 | Review weekly | Batch review, not real-time | Low-confidence, hunt hypothesis generation, compliance logging | Failed logins below threshold, SMB access to common shares, port scan from internal host |
| Info | Log only | Not worked | Telemetry with no direct action value; used for correlation lookups | DNS query logs, process creation (most), file access (most) |
Tagging rules in Splunk ES:
| makeresults
| eval rule_name="Impossible Travel Login"
| eval tier="P2"
| eval sla_hours=4
| outputlookup alert_tier_lookup.csv append=true
Routing in Sentinel (action groups): Create separate action groups in Azure Monitor for each tier:
- P1: PagerDuty/OpsGenie webhook + SMS to on-call
- P2: ServiceNow ticket creation + email to SOC queue
- P3: Log to SharePoint/Teams channel for batch review
Analytics rule alert severity maps to tier: High = P1, Medium = P2, Low = P3.
Briefings like this, every morning before 9am.
Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.
Step 3: Allowlisting Known-Good Behavior
Allowlisting suppresses alerts for specific, documented conditions that you have confirmed are benign. Each allowlist entry should have: what it suppresses, why it's known-good, who approved it, and an expiry date.
Splunk: allowlist via lookup table
| inputlookup allowlist_processes.csv
Structure of allowlist_processes.csv:
process_name,justification,approved_by,expiry_date
powershell.exe -EncodedCommand dABlAHMAdAA=,Base64 encoded 'test' -- used by CI pipeline,security@company.com,2026-12-01
psexec.exe,IT uses PsExec for remote admin on server OU only,it-director@company.com,2026-08-01
Apply in a detection rule:
index=sysmon EventCode=1 Image="*powershell.exe*"
| lookup allowlist_processes.csv process_name AS CommandLine OUTPUT justification
| where isnull(justification) // only alert on non-allowlisted commands
| stats count by ComputerName, CommandLine, User
Sentinel: allowlist via watchlist
let allowlisted_processes = _GetWatchlist('AllowlistedProcesses')
| project process_name, justification;
SecurityEvent
| where EventID == 4688
| where NewProcessName has "powershell"
| join kind=leftanti allowlisted_processes
on $left.CommandLine == $right.process_name
Allowlist governance rules:
- Maximum 90-day expiry on any allowlist entry (default; extend with re-approval)
- Monthly review: run a query to show all entries expiring in the next 30 days
- Any allowlist entry for a process on the LOLBAS list (living-off-the-land binaries) requires CISO sign-off
- Developers cannot add their own entries -- security team approves all
Step 4: Threshold Tuning for High-Volume Rules
Some detections fire too frequently because the threshold is wrong for your environment. Failed login rules are the most common example -- a threshold of 5 failures in 10 minutes generates enormous noise in an environment with aggressive password policies or legacy apps.
Finding your right threshold (Splunk):
index=wineventlog EventCode=4625
| bin _time span=10m
| stats count by _time, user, src_ip
| stats avg(count) as avg_count, max(count) as max_count,
perc90(count) as p90_count by user
| sort -p90_count
| head 20
This shows the 90th percentile failure count per user per 10-minute window. Set your alert threshold above the p90 for normal accounts (so normal bad password days don't alert) but below the max for accounts you know have been spray-attacked.
Typical threshold calibration approach:
- Disable the alert (or set to report-only/log-only)
- Run for 2 weeks; collect the count distribution
- Set threshold at 95th percentile + 20% buffer
- Re-enable and track true positive rate for 2 weeks
- Adjust if needed
Sentinel: dynamic thresholds Sentinel's ML-based Anomaly rules use adaptive baselines rather than fixed thresholds. For key detections (logon anomalies, data exfiltration volume), prefer Anomaly rules over fixed-threshold Scheduled Query rules when the baseline varies significantly across users or time of day.
Step 5: Retire Rules That Generate Zero True Positives
Every SIEM deployment accumulates rules that have never produced a confirmed incident. These consume analyst time, erode trust in the alert stream, and create noise that masks real detections.
Identify candidates for retirement (Splunk):
index=notable earliest=-90d
| stats count as alert_count,
count(eval(status="resolved")) as resolved,
count(eval(owner!="unassigned")) as worked
by source
| eval worked_rate=round(worked/alert_count*100,1)
| eval resolve_rate=round(resolved/alert_count*100,1)
| where worked_rate < 5 // less than 5% of alerts were ever worked
| sort -alert_count
For any rule with <5% worked rate over 90 days, conduct a quick review:
- Is the alert volume so high analysts gave up working it? (tuning problem)
- Does the rule detect something real that's never happened in 90 days? (consider moving to P3/hunt)
- Was the rule designed for a system we no longer have? (retire it)
Retirement process:
- Move from P1/P2 to P3 (log-only) for 30 days -- don't delete immediately
- Confirm no incidents are missed during the 30-day observation period
- Disable with a comment documenting why and the date
- Full delete after 6 months if no reason to reinstate
Keep a rule retirement log:
Rule name | Date disabled | Reason | Approved by | Review date
Brute force - 5 failures in 1 min | 2026-03-01 | 0 TPs in 180 days, threshold wrong for env | CISO | 2026-09-01
Measuring Whether Your Tuning Is Working
Tuning without measurement is guesswork. Track these four metrics monthly:
| Metric | How to measure | Target |
|---|---|---|
| Total alert volume | Count of all alerts in SIEM per day (rolling 30-day avg) | Declining month-over-month |
| Alert worked rate | Alerts assigned to an analyst / total alerts | >80% |
| False positive rate | Alerts closed as false positive / total worked alerts | <20% |
| MTTD (mean time to detect) | Time from attack start to first alert (use purple team data) | Declining |
Splunk dashboard query for monthly trend:
index=notable earliest=-6mon
| bin _time span=1mon
| stats count as total_alerts,
count(eval(status="false positive")) as false_positives,
count(eval(owner!="unassigned")) as worked
by _time
| eval fp_rate=round(false_positives/total_alerts*100,1)
| eval worked_rate=round(worked/total_alerts*100,1)
| table _time, total_alerts, worked_rate, fp_rate
If total alert volume is declining and worked rate is increasing: tuning is working. If total alert volume is declining but worked rate is flat: you're suppressing without fixing root causes.
The bottom line
Alert fatigue kills security programs quietly. Analysts stop investigating, real incidents get missed, and the SIEM becomes a compliance checkbox rather than a detection tool. The fix is methodical: source analysis first, tier routing second, allowlisting third, threshold calibration fourth, rule retirement last. Each step takes less than a week. Do them in order and measure every 30 days.
Frequently asked questions
Where do I start when SIEM alert volume is unmanageable?
Start with a source analysis, not a rule analysis. Run a query to count alerts by source or rule over the last 30 days, sorted by volume descending. The top 5 sources almost always account for more than half your total volume. Fix those five before touching anything else. In Splunk: index=notable | stats count by source | sort -count. In Sentinel: SecurityAlert | summarize count() by AlertName | order by count_ desc.
What is the difference between suppression and tuning?
Suppression silences an alert for a specific condition without changing the underlying rule (e.g., suppress alerts from a known vulnerability scanner IP). Tuning modifies the detection logic itself to reduce false positives at the source (e.g., adding a minimum threshold or excluding known-good processes). Suppression is faster but accumulates technical debt; tuning is more work but improves rule quality permanently. Use suppression for one-off exceptions, tuning for systematic false positive patterns.
How do I know if my SIEM tuning is creating detection blind spots?
Run your suppression/exclusion list through a purple team exercise quarterly: execute the ATT&CK technique that your suppressed rule was designed to detect, and confirm the alert still fires for the malicious version. Also review your exclusion list monthly -- entries added for 'temporary' reasons often become permanent. If an allowlisted process or IP starts exhibiting new behavior, your suppression may be hiding a real threat.
What is a tier-based alert routing model?
A three-tier model routes alerts by required response speed: P1 (page the on-call analyst immediately -- high-confidence, high-impact detections like ransomware indicators or admin credential use from foreign IP), P2 (investigate within 4 business hours -- medium-confidence detections requiring analyst judgment), P3 (review weekly in batch -- low-confidence detections used for hunting, not real-time response). Most orgs have everything as P1, which means nothing gets P1 treatment.
Should I tune alerts or buy a SOAR to handle the volume?
Tune first. A SOAR automates the response to alerts -- it does not reduce the number of alerts. If you feed a SOAR 10,000 noisy alerts per day, you get 10,000 automated false positive responses per day, each of which may have side effects (blocking legitimate traffic, flooding ticketing systems). SOAR is most valuable when applied to a well-tuned, lower-volume, higher-fidelity alert stream.
How do I justify SIEM tuning time to leadership when it's not 'adding new detections'?
Frame it as analyst capacity. If your team investigates 50 alerts per day and 40 are false positives, they have capacity for 10 real investigations. Tuning to 20 false positives doubles their investigation capacity without hiring. Calculate: average analyst investigation time (e.g., 15 min per alert) x false positive volume x days per year = hours wasted. Convert to FTE cost. That's the ROI of tuning.
Sources & references
Free resources
Critical CVE Reference Card 2025–2026
25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.
Ransomware Incident Response Playbook
Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.
Get threat intel before your inbox does.
50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.
Unsubscribe anytime. We never sell your data.

Founder & Cybersecurity Evangelist, Decryption Digest
Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.
The Mythos Brief is free.
AI that finds 27-year-old zero-days. What it means for your security program.
