Practitioner GuideMay 15, 202615 min read

How to Write YARA Rules for Malware Detection: Practitioner Guide

Sources:VirusTotal: YARA Documentation|YARA-X: Rust Rewrite Documentation|Awesome-YARA: Community Rules Repository|Velociraptor YARA Artifact Documentation|SANS: Practical Malware Analysis and YARA

Eric Bang

Founder & Cybersecurity Evangelist

500+ billion

Files scanned with YARA rules on VirusTotal annually

90%+

Professional malware analysts using YARA per SANS survey data

3-4x

Scan speed improvement in YARA-X (Rust rewrite) compared to original YARA

10,000+

Public YARA rules in the Awesome-YARA community repository

YARA started as a personal project by Victor Alvarez at VirusTotal and became the most widely adopted file-level detection format in the security industry. Its success comes from a simple insight: malware researchers need a way to express what they know about a malware sample in a portable, executable format that other researchers and security tools can reuse. A YARA rule captures that knowledge as named string patterns and a boolean condition that must be true for a file to match.

Unlike behavioral detection in an EDR or log-based detection in a SIEM, YARA operates on bytes. It does not require the malware to execute, does not require network connectivity, and does not require a running operating system. This makes it uniquely valuable for offline forensic analysis, sandbox integration, retrospective hunting across archived files, and endpoint scanning via tools like Velociraptor. This guide covers everything from rule anatomy to production-grade detection patterns.

YARA Rule Anatomy: Meta, Strings, and Condition Blocks

Every YARA rule follows the same three-section structure:

rule RuleName
{
    meta:
        author = "Analyst Name"
        description = "Detects XYZ malware family loader"
        date = "2026-05-15"
        reference = "https://blog.example.com/xyz-analysis"
        hash = "d41d8cd98f00b204e9800998ecf8427e"
        severity = "high"

    strings:
        $string1 = "malicious_string"
        $hex1 = { 4D 5A 90 00 03 00 00 00 04 00 }
        $regex1 = /[A-Z]{4}\d{8}/

    condition:
        all of them
}

The meta block is optional but essential for operational use. It stores descriptive information that helps analysts understand, manage, and triage rule matches. Useful meta fields include: author, description, date, reference (URL to threat intelligence report or malware analysis), hash (SHA256 of a representative sample), severity, and mitre_attack (ATT&CK technique ID). Meta fields are not evaluated; they do not affect whether a rule matches. They are purely informational.

The strings block defines named patterns. Each string is prefixed with a $ identifier that is referenced in the condition. String names must be unique within a rule. There are three string types (covered in depth in the next section). You can define any number of strings; rules commonly have between 2 and 20.

The condition block is where the detection logic lives. It is a boolean expression that must evaluate to true for the file to match the rule. The condition can reference:

Individual strings: $string1 evaluates to true if that string appears anywhere in the file
Quantifiers: all of them (all strings must match), any of them (at least one), 2 of them (at least two), all of ($hex*) (all strings whose names start with $hex)
File properties: filesize, uint8(), uint16(), uint32() for reading raw bytes at specific offsets
At/in operators: $string1 at 0 (string must appear at byte offset 0), $string1 in (0..512) (must appear in the first 512 bytes)
Boolean logic: and, or, not

The rule name follows standard identifier rules (letters, numbers, underscore; no spaces; cannot start with a number). Rule names must be unique within a YARA file or namespace. When loading multiple YARA files, use the --no-warnings flag to suppress namespace collision warnings, or organize rules into named namespaces with the rule RuleName : TagName syntax.

String Types in Depth: Text, Hex Patterns, and Regular Expressions

YARA supports three string types, each suited to different detection scenarios.

Text strings match ASCII or Unicode character sequences:

strings:
    $ascii_str = "malware_config_key"
    $wide_str = "malware_config_key" wide  // UTF-16LE (common in PE files)
    $both = "malware_config_key" wide ascii  // Match either encoding
    $nocase = "MalwareConfigKey" nocase  // Case-insensitive match
    $fullword = "cmd" fullword  // Must be bounded by non-alphanumeric chars
    $xor_str = "malware" xor  // Try all single-byte XOR keys (0x00-0xFF)
    $xor_range = "malware" xor(0x01-0x7F)  // Try specific XOR key range

The wide modifier is critical for Windows PE file analysis because many string constants in compiled Windows executables are stored as UTF-16LE (wide character). Without wide, your rule misses strings that appear in wide encoding. Use wide ascii to match both.

The xor modifier (introduced in YARA 3.11) tries all 256 single-byte XOR encodings of a string. This catches simple XOR-encoded configuration strings and is extremely useful for detecting packed or obfuscated malware that encodes embedded strings with a single-byte key.

The fullword modifier ensures the matched string is bounded by non-alphanumeric characters on both sides. "cmd" fullword matches " cmd" or "cmd." but not "cmdline". This is a powerful false positive reducer for short strings.

Hex patterns match specific byte sequences and support wildcards and jumps:

strings:
    $mz_header = { 4D 5A }  // MZ magic bytes (PE file header)
    $wildcards = { E8 ?? ?? 00 00 }  // CALL instruction with unknown 2-byte offset
    $nibble_wild = { C6 4? 24 ?? }  // Wildcard lower nibble
    $jump = { 90 [2-4] FF D0 }  // NOP, then 2-4 unknown bytes, then CALL rax
    $alt = { ( 0F 84 | 0F 85 ) ?? ?? }  // JE or JNE instruction

The ?? wildcard matches any single byte. ?x matches any byte whose lower nibble is x. The [n-m] jump syntax matches between n and m bytes of any content. The (A | B) alternation syntax matches either byte sequence A or B. These constructs allow hex patterns to match code sequences that vary between samples due to compiler differences or minor code modifications.

Regular expressions match using Perl-compatible regex syntax:

strings:
    $c2_pattern = /https?:\/\/[a-z0-9]{8,16}\.[a-z]{2,4}\/[a-z0-9]{4,8}/
    $base64_blob = /[A-Za-z0-9+\/]{100,}={0,2}/
    $ipv4_pattern = /\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b/

Regular expressions are the most flexible string type but also the most performance-expensive. Avoid anchoring regexes to variable-length patterns; use hex patterns for byte-level matching where possible and reserve regex for text fields that genuinely require pattern matching.

Free daily briefing

Briefings like this, every morning before 9am.

Threat intel, active CVEs, and campaign alerts, distilled for practitioners. 50,000+ subscribers. No noise.

Writing Effective Conditions: Boolean Logic, Filesize, and Counting Matches

The condition block transforms a list of named strings into a detection decision. Sophisticated conditions are what separate high-precision rules from noisy ones.

Combining strings with AND reduces false positives:

condition:
    $string1 and $hex1 and $regex1

Requiring all three strings to be present simultaneously is exponentially more specific than requiring any one of them. A practical rule of thumb: require at least three independent strings with AND logic for any rule targeting a broad file category like PE executables.

Using filesize to constrain the match scope:

condition:
    filesize < 500KB and $string1

Malware droppers are frequently small. Loader stages are often under 200 KB. Adding a filesize constraint that matches the expected size range of the target sample class eliminates matches against large legitimate executables that happen to contain one of your strings.

Checking file type via magic bytes before evaluating strings:

condition:
    uint16(0) == 0x5A4D  // MZ header (PE file)
    and filesize < 1MB
    and ($string1 or $string2)
    and $hex1

The uint16(0) == 0x5A4D check confirms the file starts with the PE MZ magic bytes before evaluating any strings. This scopes the rule to PE executables only, eliminating false positives in text files, scripts, and documents that might contain the same string content. Similarly, uint32(0) == 0xBEBAFECA identifies Mach-O fat binaries, and uint32(0) == 0x504B0304 identifies ZIP-based files (including Office documents).

Counting string occurrences:

condition:
    #c2_domain > 3  // The # prefix counts occurrences

The #string_name syntax returns the count of times a string appears in the file. Rules for C2 beacon configuration parsers or dropper stubs that embed multiple domains or IPs can use occurrence counting as a signal.

Using string offsets for structural detections:

condition:
    $pe_header at 0
    and $config_marker in (1024..4096)

The at operator requires the string to appear at a specific byte offset; the in operator requires it within a byte range. These are useful for detecting specific file structures where known markers appear at predictable locations, such as custom file format headers or configuration sections at fixed offsets from the PE entry point.

Practical Detection Scenarios with Full Annotated Rules

The following complete rules demonstrate production-quality YARA for common malware detection scenarios.

Detecting packed executables (UPX packer):

rule Packer_UPX
{
    meta:
        description = "Detects executables packed with UPX packer"
        author = "Decryption Digest"
        date = "2026-05-15"
        severity = "medium"

    strings:
        $upx0 = "UPX0" ascii
        $upx1 = "UPX1" ascii
        $upx_magic = { 55 50 58 21 }  // "UPX!" magic

    condition:
        uint16(0) == 0x5A4D  // PE file
        and ($upx0 and $upx1)
        and $upx_magic
}

Detecting macro-embedded PowerShell in Office documents:

rule Maldoc_PowerShell_Dropper
{
    meta:
        description = "Detects Office documents with embedded PowerShell download cradles"
        severity = "high"
        mitre_attack = "T1566.001, T1059.001"

    strings:
        $ps_download1 = "DownloadString" nocase wide ascii
        $ps_download2 = "DownloadFile" nocase wide ascii
        $ps_iex = "IEX" wide ascii fullword
        $ps_invoke = "Invoke-Expression" nocase wide ascii
        $office_magic = { D0 CF 11 E0 A1 B1 1A E1 }  // OLE2 compound document
        $ooxml = "xl/" ascii  // OOXML structure

    condition:
        ($office_magic at 0 or $ooxml)
        and (($ps_download1 or $ps_download2) and ($ps_iex or $ps_invoke))
}

Detecting Cobalt Strike beacon characteristics:

rule RAT_CobaltStrike_Beacon
{
    meta:
        description = "Detects Cobalt Strike beacon shellcode and payload characteristics"
        severity = "critical"
        reference = "https://www.cobaltstrike.com/"
        mitre_attack = "T1059.003, T1055"

    strings:
        $config_marker = { 00 01 BE EF }  // CS config block marker variant
        $watermark_pattern = { 69 68 69 68 }  // Common CS watermark bytes
        $sleep_mask = "Sleep" wide ascii
        $pipe_pattern = "\\\\.\\pipe\\" wide ascii
        $cs_string1 = "%s as %s\\%s: %d" ascii  // CS format string
        $cs_string2 = "beacon.x64.dll" ascii nocase

    condition:
        filesize < 2MB
        and (2 of ($config_marker, $watermark_pattern, $cs_string1, $cs_string2))
        and ($pipe_pattern or $sleep_mask)
}

Note that Cobalt Strike rules require regular updates as threat actors rotate watermarks and modify configuration structures. These rules detect common patterns but should be supplemented with behavioral detection for fully customized CS deployments.

Testing Rules with YARA CLI, YARA-X, and yaraTool

A YARA rule that has not been tested against real samples is a hypothesis, not a detection. Rigorous testing requires both positive validation (the rule matches all known samples of the target) and negative validation (the rule does not match clean files).

Testing with the original YARA CLI:

The original YARA binary is yara (available on Linux/macOS via package manager, or built from source). Basic usage:

# Test a single rule file against a directory of samples
yara rule.yar /path/to/malware_samples/

# Test with verbose output showing matched strings
yara -s rule.yar sample.exe

# Test recursively through subdirectories
yara -r rule.yar /path/to/samples/

# Test against process memory (requires root/admin)
yara -p rule.yar

# List all rules in a compiled rule set
yara -l rules.yac

Testing with YARA-X:

YARA-X uses the yr command:

# Scan a file with a rule
yr scan rule.yar sample.exe

# Compile rules and report any syntax errors or warnings
yr compile rule.yar

# Scan with output in JSON format for pipeline integration
yr scan --format json rule.yar /path/to/samples/

YARA-X's compile step is stricter than original YARA and will flag issues that YARA silently accepts, such as unused string variables (strings defined in the strings block but not referenced in the condition). Fix these warnings before deploying rules.

False positive testing:

Collect a corpus of clean files in the same category as your target. For PE executable rules, use a sample of 500 to 1,000 legitimate signed executables from known-good sources (Microsoft, major software vendors). Scan the entire corpus with your rule: yara -r rule.yar /path/to/clean_corpus/ > matches.txt. Any output in matches.txt is a false positive that requires rule refinement. Common refinements include adding fullword to string modifiers, narrowing hex patterns, adding file type checks via uint16(0), or requiring additional strings with AND logic.

Integrating testing into a CI/CD pipeline:

For organizations maintaining a rule library in a Git repository, automate testing on every pull request using GitHub Actions or GitLab CI. Run yr compile *.yar to catch syntax errors, and run against a versioned sample corpus stored in Git LFS or an S3 bucket to detect regressions in rule precision.

Integrating YARA into Threat Hunting Workflows

The value of YARA rules compounds when integrated into platforms that can apply them at scale across live endpoints, retrospective file collections, or sandboxed samples.

VirusTotal Intelligence: VirusTotal Intelligence (VTI) subscribers can submit YARA rules for retrohunt against VirusTotal's full malware corpus (hundreds of billions of files). The retrohunt returns all files in the VTI corpus that match the rule, along with their analysis results, first submission dates, and file metadata. This is invaluable for: confirming a new rule does not generate excessive false positives at scale, discovering related samples of a malware family you are analyzing, and attributing new samples to known threat actor infrastructure. VTI also supports livehunt, which monitors new file submissions in real time and alerts when a submission matches your YARA rule.

Velociraptor: Velociraptor is an open-source DFIR platform that can deploy YARA scanning artifacts across thousands of endpoints simultaneously. Key artifacts:

Windows.Detection.Yara.File: scans files matching a glob pattern (e.g., C:\\Users\\**\\Downloads\\*.exe) with a specified YARA rule
Windows.Detection.Yara.Process: scans the memory of running processes matching a process name filter
Generic.Detection.Yara.Glob: cross-platform file scanning

Velociraptor returns matching file paths, matched string names and offsets, and process metadata for memory scans. Results are searchable in Velociraptor's investigation timeline and can be exported to a SIEM via the server-side event monitoring component.

CAPE Sandbox and Cuckoo: CAP Sandbox (Community Automated Payload Extraction) and Cuckoo sandbox both support YARA rule integration for automatic classification of submitted samples. Rules stored in the data/yara/ directory of a CAPE installation are automatically run against the unpacked sample memory during execution, allowing classification of samples that would evade static file scanning due to packing.

EDR integration via YARA IOC feeds: For EDR platforms that support IOC-based hunting (CrowdStrike Custom IOAs, Tines playbooks, Microsoft Defender Custom Detection Rules), YARA matches from a Velociraptor hunt can be converted to file hash or string-based IOCs and pushed to the EDR for persistent monitoring. While this is a one-step removal from live YARA scanning, it operationalizes YARA findings into the EDR's continuous monitoring layer.

The bottom line

YARA is an enduring standard in the malware analysis and threat hunting toolkit because it solves a real problem with elegant simplicity: expressing what you know about malware in a portable, executable, and shareable format. Writing effective rules requires the discipline to test aggressively for false positives before deployment, the habit of combining multiple strings with AND logic rather than relying on single-string rules, and a workflow that integrates rules into the platforms where they generate operational value. Start with community rule sets, study how experienced researchers express detection logic, and build a library of validated rules that grows with your threat intelligence.

Frequently asked questions

What is YARA and how does it differ from Sigma or Snort rules?

YARA operates at the file or memory level, matching byte patterns and string content within individual files or process memory dumps. Sigma is a vendor-neutral SIEM rule format that describes log-based behavioral detections (event fields, counts, correlations) and compiles to platform-specific query languages like KQL or SPL. Snort and Suricata rules operate at the network packet level, matching against packet payloads and protocol fields in real-time traffic. The three formats are complementary: YARA handles static file analysis and memory hunting, Sigma handles log-based behavioral detection, and Snort/Suricata handle network traffic inspection. A mature threat hunting toolkit uses all three, with YARA deployed in EDR agents, sandboxes, and forensic platforms, Sigma in the SIEM, and Suricata at network chokepoints.

What are the most common YARA rule mistakes that cause false positives?

The most common mistake is writing rules based on overly common strings that appear in legitimate software. Strings like `cmd.exe`, `powershell`, or `CreateRemoteThread` appear in thousands of legitimate programs and create massive false positive volumes. Effective rules combine multiple strings with AND logic in the condition so that all strings must be present simultaneously, dramatically reducing false positive rates. A second common mistake is writing hex patterns that match too broadly, for example using long runs of wildcards (`?? ?? ??`) that end up matching legitimate binary patterns. A third mistake is not testing against a corpus of known-clean files before declaring the rule production-ready. Always scan a sample of benign files in the same category as your target (PE executables, Office documents) before deploying.

How do I write a YARA rule for a specific malware family?

Start by obtaining multiple samples of the target malware family (from VirusTotal, MalwareBazaar, or your own collection) and using a binary diffing tool like BinDiff or manually identifying shared byte sequences using a hex editor. Look for patterns that are unique to the malware: packed section names, encoded configuration strings, specific API import combinations, or byte sequences in the unpacked code. Write strings for two to five distinctive patterns and use AND logic in the condition. Validate the rule against the full sample set to confirm all samples match, then validate against a clean file corpus to confirm no false positives. Tools like yaraify.abuse.ch and VirusTotal retrohunt allow testing YARA rules against large corpora of known malware and clean files.

What is YARA-X and should I migrate to it?

YARA-X is a complete rewrite of the YARA engine in Rust, developed by VirusTotal and released publicly in 2024. It offers three to four times faster scan performance, stricter rule validation at compile time (catching errors that original YARA accepts silently), improved regular expression performance, and better error messages. YARA-X introduces breaking changes: some original YARA features behave differently or are not yet implemented, so rules written for YARA 4.x may not compile under YARA-X without modification. For new deployments and rule development workflows, YARA-X is the forward-looking choice. For production environments with large existing rule sets, validate rules against YARA-X in a test environment before migrating. The YARA-X CLI command is `yr` and its syntax is similar to the original `yara` command.

How do I test YARA rules without executing malware?

You do not need to execute malware to test YARA rules. The `yara` or `yr` (YARA-X) CLI scans files statically without executing them: `yara rule.yar /path/to/samples/` scans all files in the directory and reports matches. For memory scanning, use Volatility or Rekall to extract memory dumps from sandboxes or forensic captures and scan the dumps with YARA: `yara rule.yar memory.dmp`. Online platforms that accept YARA rules and scan against their corpus without you uploading malware samples include VirusTotal Intelligence YARA Hunting (subscription), UnpacMe, and MalwareBazaar's YARA search feature. For testing false positive rates, collect a corpus of clean files in the same file type category as your target and run the rule against that corpus.

How do I integrate YARA rules into my EDR or SIEM?

Integration paths depend on your platform. CrowdStrike Falcon supports custom IOA rules that can include file hash matching, and CrowdStrike's sandbox (Falcon Sandbox) runs YARA rules against submitted samples. SentinelOne supports custom detection rules based on indicators but does not natively execute arbitrary YARA on-agent. The best general-purpose on-agent YARA integration is Velociraptor, an open-source DFIR platform that supports the `Windows.Detection.Yara.Process` and `Windows.Detection.Yara.File` artifacts to run YARA rules against live process memory and files on endpoints at scale. For SIEM integration, file hash matches from YARA scans can be exported as IOCs and ingested into your SIEM as threat intelligence indicators for correlation against process and network logs.

Where can I find quality open-source YARA rules to start with?

The highest-quality curated repositories are: the Awesome-YARA repository on GitHub (links to over 50 curated rule sets), the ESET malware research YARA rules on GitHub (high-precision, well-tested rules for tracked APT groups), the Florian Roth signature-base repository (over 3,000 rules covering a wide range of malware families and threat actors), and the Elastic Security team's protections-artifacts repository (rules integrated with Elastic Defend). VirusTotal Intelligence subscribers can browse and retrohunt with community-shared YARA rules. For vendor-specific threat intelligence, many threat intelligence vendors (Mandiant, CrowdStrike Intelligence, Recorded Future) include YARA rules with their reporting. Always review and test any third-party rule before deploying it in production.

Sources & references

Free resources

Free download

Critical CVE Reference Card 2025–2026

25 actively exploited vulnerabilities with CVSS scores, exploit status, and patch availability. Print it, pin it, share it with your SOC team.

Free download

Ransomware Incident Response Playbook

Step-by-step 24-hour IR checklist covering detection, containment, eradication, and recovery. Built for SOC teams, IR leads, and CISOs.

Free newsletter

Get threat intel before your inbox does.

50,000+ security professionals read Decryption Digest for early warnings on zero-days, ransomware, and nation-state campaigns. Free, weekly, no spam.

Unsubscribe anytime. We never sell your data.

Author

Eric BangCISSP

Founder & Cybersecurity Evangelist, Decryption Digest

Cybersecurity professional with expertise in threat intelligence, vulnerability research, and enterprise security. Covers zero-days, ransomware, and nation-state operations for 50,000+ security professionals weekly.

View profile →LinkedIn

Back to all briefings

Subscribe for Updates

YARA rules malware detection YARA rule writing threat hunting malware analysis YARA-X VirusTotal YARA Velociraptor YARA IOC detection malware classification

Free Brief

The Mythos Brief is free.

AI that finds 27-year-old zero-days. What it means for your security program.