Engine 1 of 7

Pattern Matching Engine

Lightning-fast regex-based vulnerability detection using 2,400+ curated security patterns. The first line of defense in Bloodhound's 7-engine architecture.

2,400+

Patterns

<100ms

Avg Scan Time

12+

Languages

99.2%

Accuracy

Overview

The Pattern Matching Engine is the fastest scanner in Bloodhound's arsenal, designed to identify common vulnerability patterns in milliseconds. It uses a highly optimized regex engine with parallel execution to scan thousands of lines per second.

Unlike traditional regex scanners, Bloodhound's pattern engine understands code context, reducing false positives by 73% compared to grep-based tools. Each pattern includes metadata about severity, CWE mapping, and remediation guidance.

Why Pattern Matching First?

Speed: Eliminates 80% of safe code before deeper analysis
Coverage: Catches obvious issues that other engines might miss
Efficiency: Low CPU/memory footprint enables real-time scanning

How It Works

Tokenization

Source code is tokenized into a stream that preserves semantic meaning while normalizing whitespace and comments.

Pattern Compilation

2,400+ patterns are compiled into a single DFA (Deterministic Finite Automaton) for O(n) matching complexity.

Parallel Execution

Files are processed in parallel using worker threads, with pattern matching distributed across CPU cores.

Context Validation

Matches are validated against surrounding code context to eliminate false positives from comments, strings, and dead code.

Pattern Syntax

Bloodhound patterns use an extended regex syntax with semantic annotations for code-aware matching.

YAML

1# Pattern Definition Format
2id: sql-injection-concatenation
3severity: critical
4cwe: CWE-89
5languages: [javascript, typescript, python]
6
7# Pattern with semantic markers
8pattern: |
9  $QUERY = $STRING + $USER_INPUT
10  $DB.$METHOD($QUERY)
11
12# Context constraints
13constraints:
14  - $DB.type: [mysql, postgres, sqlite]
15  - $METHOD: [query, execute, raw]
16  - $USER_INPUT.source: [request, params, body]
17
18# Auto-fix suggestion
19fix: |
20  Use parameterized queries instead:
21  $DB.$METHOD($STRING, [$USER_INPUT])

Semantic Variables

Variables like $USER_INPUT and $DB are resolved using Bloodhound's type inference system, not just string matching.

Custom Patterns

Create organization-specific patterns to enforce internal security standards and detect domain-specific vulnerabilities.

YAML

1// .bloodhound/patterns/custom-auth.yaml
2patterns:
3  - id: custom-jwt-validation
4    name: "Missing JWT Audience Validation"
5    severity: high
6    message: "JWT tokens must validate audience claim"
7    languages: [typescript, javascript]
8
9    pattern: |
10      jwt.verify($TOKEN, $SECRET)
11
12    negative_pattern: |
13      jwt.verify($TOKEN, $SECRET, { audience: $_ })
14
15    fix: |
16      jwt.verify(token, secret, {
17        audience: 'your-app-audience',
18        issuer: 'your-issuer'
19      })
20
21  - id: internal-api-auth
22    name: "Internal API Missing Auth Header"
23    severity: medium
24    pattern: |
25      fetch($URL)
26    constraints:
27      - $URL.matches: /internal-api\./
28    negative_pattern: |
29      fetch($URL, { headers: { Authorization: $_ } })

Performance

Codebase Size	Files	Scan Time	Memory
Small (<10K LOC)	~50 files	<50ms	~20MB
Medium (10K-100K LOC)	~500 files	<200ms	~50MB
Large (100K-1M LOC)	~5,000 files	<2s	~150MB
Enterprise (>1M LOC)	~50,000 files	<15s	~500MB

Real-World Examples

Injection

340 patterns

SQL InjectionCommand InjectionLDAP Injection

Authentication

180 patterns

Weak PasswordSession FixationCredential Exposure

Cryptography

220 patterns

Weak AlgorithmHardcoded KeysInsufficient Entropy

Next Engine

Pattern matches are passed to the SAST Engine for deeper analysis including control flow and data flow validation.