Verification Engine

Bayesian multi-signal hallucination detection with calibrated likelihood ratios, SPC baselines, and adaptive thresholds.

Overview

The VerificationEngine is the brain of agentguard v0.2's detection pipeline. It replaces the old HallucinationDetector with a rigorous Bayesian multi-signal fusion architecture that combines multiple zero-cost signals to compute a posterior probability of hallucination.

Instead of hard-coded rules, the engine uses calibrated likelihood ratios and the log-odds form of Bayes' theorem for numerical stability. Every signal either increases or decreases the probability that a tool call result was fabricated.

The VerificationEngine implements techniques from 14 academic papers on tool-call verification, Bayesian inference, and Statistical Process Control. See Section 4.2 and 7.2 of the agentguard research paper.

Tiered Architecture

The engine runs a two-tier pipeline. Every check is zero-cost — no external API calls, no model inference, just math and statistics.

  Tool Call Result + Execution Time
           │
  ┌────────▼─────────────────────────┐
  │  TIER 0: Zero-Cost Pre-Checks   │
  │  ┌─────────────────────────────┐ │
  │  │ Schema validation           │ │
  │  │ Latency plausibility        │ │
  │  │ Pattern matching            │ │
  │  │ Response length bounds      │ │
  │  └─────────────────────────────┘ │
  └────────┬─────────────────────────┘
           │
  ┌────────▼─────────────────────────┐
  │  Bayesian Signal Combiner        │
  │  Prior P(H) + Tier 0 signals     │
  │  → Posterior via log-odds update  │
  └────────┬─────────────────────────┘
           │
     ┌─────▼─────┐
     │ P(H)≥0.5? │──Yes──▶ BLOCK (skip Tier 1)
     └─────┬─────┘
           │ No
  ┌────────▼─────────────────────────┐
  │  TIER 1: Post-Execution Checks   │
  │  ┌─────────────────────────────┐ │
  │  │ SPC baseline anomaly        │ │
  │  │ Session consistency         │ │
  │  │ Cross-session consistency   │ │
  │  │ Value plausibility (SPC)    │ │
  │  └─────────────────────────────┘ │
  └────────┬─────────────────────────┘
           │
  ┌────────▼─────────────────────────┐
  │  Bayesian Update (all signals)   │
  └────────┬─────────────────────────┘
           │
     ┌─────▼──────────────────┐
     │ P(H) < 0.2 → accept   │
     │ 0.2 ≤ P(H) < 0.5 → flag│
     │ P(H) ≥ 0.5 → block    │
     └────────────────────────┘

Quick Start

python

from agentguard.verification import VerificationEngine

engine = VerificationEngine()

# Register a tool profile with expected behaviour
engine.register_tool_profile("get_weather",
    expected_latency_ms=(100, 5000),
    required_fields=["temperature", "humidity"],
    has_network_io=True,
)

# Verify a tool call result
result = engine.verify("get_weather",
    args={{"city": "London"}},
    result={{"temperature": 18, "humidity": 65}},
    execution_time_ms=350.0,
)

print(result.verdict)     # "accept"
print(result.confidence)  # 0.127
print(result.signals)     # {{signal_name: SignalResult(...)}}
print(result.tier_reached) # VerificationTier.TIER_1

Zero config works

If you don’t register a tool profile, the engine uses sensible defaults (2ms–60s latency range, no schema checks). Register profiles for tools where you know the expected behaviour.

Signal Types & Likelihood Ratios

Each signal has a default likelihood ratio (LR) — the ratio P(signal fires | hallucination) / P(signal fires | not hallucination). Higher LR = stronger evidence of hallucination when the signal fires.

Signal	Default LR	Tier	What it detects
`schema_mismatch`	`12.0`	0	Missing required fields or forbidden fields present
`pattern_mismatch`	`6.0`	0	Response doesn’t match any expected regex patterns
`latency_anomaly`	`3.5`	0	Impossibly fast execution (<2ms = near-certain fabrication)
`length_anomaly`	`2.0`	0	Response too short or too long vs. configured bounds
`historical_inconsistency`	`4.5`	1	Different from historical results for same args
`session_inconsistency`	`4.0`	1	Contradicts earlier results in the same session
`spc_anomaly`	`3.0`	1	Statistical outlier vs. baseline (Western Electric rules)
`value_plausibility`	`3.0`	1	Numeric field values outside mean ± 3σ

How the Bayesian Combiner Works

The engine uses the log-odds form of Bayes' theorem for numerical stability:

python

# Internal log-odds update (simplified)
log_odds = log(prior / (1 - prior))

for signal in signals:
    if signal.fired:
        effective_lr = 1.0 + (signal.likelihood_ratio - 1.0) * signal.score
        log_odds += log(effective_lr)
    else:
        # Signal absent: slight update toward no hallucination
        absent_lr = 1.0 / max(signal.likelihood_ratio * 0.1, 1.01)
        log_odds += log(absent_lr)

posterior = exp(log_odds) / (1 + exp(log_odds))

Signals that don’t fire also provide evidence — their absence slightly reduces the posterior probability of hallucination.

Registering Tool Profiles

Profiles tell the engine what a real tool response looks like:

python

engine = VerificationEngine()

# Network API tool — strict latency and schema checks
engine.register_tool_profile("search_web",
    expected_latency_ms=(100, 5000),
    required_fields=["results", "total_count"],
    forbidden_fields=["error", "mock"],
    response_patterns=[r'"results"\s*:\s*\['],
    min_response_length=50,
    max_response_length=100000,
    has_network_io=True,
)

# Database query — different latency profile
engine.register_tool_profile("query_db",
    expected_latency_ms=(5, 30000),
    required_fields=["rows"],
    has_network_io=True,
)

# Pure computation — very fast is expected
engine.register_tool_profile("calculate",
    expected_latency_ms=(0.1, 100),
    has_network_io=False,  # No network I/O expected
)

ToolProfile Fields

Field	Type	Default	Description
`expected_latency_ms`	`(float, float)`	`(50, 30000)`	Min/max plausible latency range
`required_fields`	`list[str]`	`[]`	Fields that must appear in real responses
`forbidden_fields`	`list[str]`	`[]`	Fields that should never appear
`response_patterns`	`list[str]`	`[]`	Regex patterns that real responses match
`min_response_length`	`int \| None`	`None`	Minimum JSON-serialised length
`max_response_length`	`int \| None`	`None`	Maximum JSON-serialised length
`has_network_io`	`bool`	`True`	Whether the tool makes network calls

SPC Baselines

The engine maintains Statistical Process Control (SPC) baselines for each tool using Welford’s online algorithm for running mean/variance, and applies the four Western Electric rules:

Rule	Condition	Weight
Rule 1	1 point beyond 3σ from mean	0.40
Rule 2	2 of last 3 points beyond 2σ (same side)	0.25
Rule 3	4 of last 5 points beyond 1σ (same side)	0.20
Rule 4	8 consecutive points on same side of mean	0.15

SPC checks require at least 8 prior observations before they activate. The baseline tracks latency, response size, field frequency, and per-field numeric ranges — all using RunningStats (Welford’s algorithm with a 100-observation circular buffer).

Cross-Session Consistency

The ConsistencyTracker detects implausible swings in tool outputs:

Session consistency: Compares against the last 10 results in the current session. A 50× change in any numeric field triggers a violation.
Historical consistency: Hashes tool arguments and compares results with the same args across sessions. If get_stock_price("NVDA") returned ~$650 for the last 50 calls but now returns $50, that’s flagged.

python

result = engine.verify("get_stock_price",
    args={{"ticker": "NVDA"}},
    result={{"price": 50.0, "currency": "USD"}},
    execution_time_ms=250.0,
    session_id="session-123",  # Enable session consistency
)

# If historical calls for NVDA returned ~$650, this will fire:
# result.signals["historical_inconsistency"].fired == True
# result.signals["historical_inconsistency"].detail == "Field 'price': ..."
print(result.verdict)  # "flag" or "block" depending on other signals

VerificationResult

Every call to engine.verify() returns a VerificationResult:

Field	Type	Description
`verdict`	`str`	`"accept"`, `"flag"`, or `"block"`
`confidence`	`float`	P(hallucination) in [0.0, 1.0]
`signals`	`dict[str, SignalResult]`	Per-signal details
`tier_reached`	`VerificationTier`	`TIER_0` or `TIER_1`
`explanation`	`str`	Human-readable summary
`prior`	`float`	Prior P(H) used
`posterior`	`float`	Final P(H) after all signals
`is_hallucinated`	`bool`	True when verdict is “block” (backward compat)

Adaptive Thresholds

The engine learns per-tool thresholds from feedback using Exponential Moving Average (EMA) updates. As you provide labelled feedback, thresholds adapt to each tool’s actual hallucination rate:

python

# After reviewing a result, provide feedback
engine.record_feedback("search_web",
    confidence_score=0.35,
    was_hallucination=True,   # This was actually hallucinated
)

# The engine will automatically:
# 1. Update the EMA hallucination rate for search_web
# 2. Lower the blocking threshold (be stricter)
# 3. Adjust per-tool prior for future verifications

See the Calibration guide for the complete tuning workflow.

← The @guard Decorator

Calibration →

Edit this page on GitHub