Verification Engine
Bayesian multi-signal hallucination detection with calibrated likelihood ratios, SPC baselines, and adaptive thresholds.
Overview
The VerificationEngine is the brain of agentguard v0.2's detection pipeline. It replaces the old HallucinationDetector with a rigorous Bayesian multi-signal fusion architecture that combines multiple zero-cost signals to compute a posterior probability of hallucination.
Instead of hard-coded rules, the engine uses calibrated likelihood ratios and the log-odds form of Bayes' theorem for numerical stability. Every signal either increases or decreases the probability that a tool call result was fabricated.
The VerificationEngine implements techniques from 14 academic papers on tool-call verification, Bayesian inference, and Statistical Process Control. See Section 4.2 and 7.2 of the agentguard research paper.
Tiered Architecture
The engine runs a two-tier pipeline. Every check is zero-cost — no external API calls, no model inference, just math and statistics.
Tool Call Result + Execution Time
│
┌────────▼─────────────────────────┐
│ TIER 0: Zero-Cost Pre-Checks │
│ ┌─────────────────────────────┐ │
│ │ Schema validation │ │
│ │ Latency plausibility │ │
│ │ Pattern matching │ │
│ │ Response length bounds │ │
│ └─────────────────────────────┘ │
└────────┬─────────────────────────┘
│
┌────────▼─────────────────────────┐
│ Bayesian Signal Combiner │
│ Prior P(H) + Tier 0 signals │
│ → Posterior via log-odds update │
└────────┬─────────────────────────┘
│
┌─────▼─────┐
│ P(H)≥0.5? │──Yes──▶ BLOCK (skip Tier 1)
└─────┬─────┘
│ No
┌────────▼─────────────────────────┐
│ TIER 1: Post-Execution Checks │
│ ┌─────────────────────────────┐ │
│ │ SPC baseline anomaly │ │
│ │ Session consistency │ │
│ │ Cross-session consistency │ │
│ │ Value plausibility (SPC) │ │
│ └─────────────────────────────┘ │
└────────┬─────────────────────────┘
│
┌────────▼─────────────────────────┐
│ Bayesian Update (all signals) │
└────────┬─────────────────────────┘
│
┌─────▼──────────────────┐
│ P(H) < 0.2 → accept │
│ 0.2 ≤ P(H) < 0.5 → flag│
│ P(H) ≥ 0.5 → block │
└────────────────────────┘
Quick Start
from agentguard.verification import VerificationEngine
engine = VerificationEngine()
# Register a tool profile with expected behaviour
engine.register_tool_profile("get_weather",
expected_latency_ms=(100, 5000),
required_fields=["temperature", "humidity"],
has_network_io=True,
)
# Verify a tool call result
result = engine.verify("get_weather",
args={{"city": "London"}},
result={{"temperature": 18, "humidity": 65}},
execution_time_ms=350.0,
)
print(result.verdict) # "accept"
print(result.confidence) # 0.127
print(result.signals) # {{signal_name: SignalResult(...)}}
print(result.tier_reached) # VerificationTier.TIER_1
If you don’t register a tool profile, the engine uses sensible defaults (2ms–60s latency range, no schema checks). Register profiles for tools where you know the expected behaviour.
Signal Types & Likelihood Ratios
Each signal has a default likelihood ratio (LR) — the ratio P(signal fires | hallucination) / P(signal fires | not hallucination). Higher LR = stronger evidence of hallucination when the signal fires.
| Signal | Default LR | Tier | What it detects |
|---|---|---|---|
schema_mismatch | 12.0 | 0 | Missing required fields or forbidden fields present |
pattern_mismatch | 6.0 | 0 | Response doesn’t match any expected regex patterns |
latency_anomaly | 3.5 | 0 | Impossibly fast execution (<2ms = near-certain fabrication) |
length_anomaly | 2.0 | 0 | Response too short or too long vs. configured bounds |
historical_inconsistency | 4.5 | 1 | Different from historical results for same args |
session_inconsistency | 4.0 | 1 | Contradicts earlier results in the same session |
spc_anomaly | 3.0 | 1 | Statistical outlier vs. baseline (Western Electric rules) |
value_plausibility | 3.0 | 1 | Numeric field values outside mean ± 3σ |
How the Bayesian Combiner Works
The engine uses the log-odds form of Bayes' theorem for numerical stability:
# Internal log-odds update (simplified)
log_odds = log(prior / (1 - prior))
for signal in signals:
if signal.fired:
effective_lr = 1.0 + (signal.likelihood_ratio - 1.0) * signal.score
log_odds += log(effective_lr)
else:
# Signal absent: slight update toward no hallucination
absent_lr = 1.0 / max(signal.likelihood_ratio * 0.1, 1.01)
log_odds += log(absent_lr)
posterior = exp(log_odds) / (1 + exp(log_odds))
Signals that don’t fire also provide evidence — their absence slightly reduces the posterior probability of hallucination.
Registering Tool Profiles
Profiles tell the engine what a real tool response looks like:
engine = VerificationEngine()
# Network API tool — strict latency and schema checks
engine.register_tool_profile("search_web",
expected_latency_ms=(100, 5000),
required_fields=["results", "total_count"],
forbidden_fields=["error", "mock"],
response_patterns=[r'"results"\s*:\s*\['],
min_response_length=50,
max_response_length=100000,
has_network_io=True,
)
# Database query — different latency profile
engine.register_tool_profile("query_db",
expected_latency_ms=(5, 30000),
required_fields=["rows"],
has_network_io=True,
)
# Pure computation — very fast is expected
engine.register_tool_profile("calculate",
expected_latency_ms=(0.1, 100),
has_network_io=False, # No network I/O expected
)
ToolProfile Fields
| Field | Type | Default | Description |
|---|---|---|---|
expected_latency_ms | (float, float) | (50, 30000) | Min/max plausible latency range |
required_fields | list[str] | [] | Fields that must appear in real responses |
forbidden_fields | list[str] | [] | Fields that should never appear |
response_patterns | list[str] | [] | Regex patterns that real responses match |
min_response_length | int | None | None | Minimum JSON-serialised length |
max_response_length | int | None | None | Maximum JSON-serialised length |
has_network_io | bool | True | Whether the tool makes network calls |
SPC Baselines
The engine maintains Statistical Process Control (SPC) baselines for each tool using Welford’s online algorithm for running mean/variance, and applies the four Western Electric rules:
| Rule | Condition | Weight |
|---|---|---|
| Rule 1 | 1 point beyond 3σ from mean | 0.40 |
| Rule 2 | 2 of last 3 points beyond 2σ (same side) | 0.25 |
| Rule 3 | 4 of last 5 points beyond 1σ (same side) | 0.20 |
| Rule 4 | 8 consecutive points on same side of mean | 0.15 |
SPC checks require at least 8 prior observations before they activate. The baseline tracks latency, response size, field frequency, and per-field numeric ranges — all using RunningStats (Welford’s algorithm with a 100-observation circular buffer).
Cross-Session Consistency
The ConsistencyTracker detects implausible swings in tool outputs:
- Session consistency: Compares against the last 10 results in the current session. A 50× change in any numeric field triggers a violation.
- Historical consistency: Hashes tool arguments and compares results with the same args across sessions. If
get_stock_price("NVDA")returned ~$650 for the last 50 calls but now returns $50, that’s flagged.
result = engine.verify("get_stock_price",
args={{"ticker": "NVDA"}},
result={{"price": 50.0, "currency": "USD"}},
execution_time_ms=250.0,
session_id="session-123", # Enable session consistency
)
# If historical calls for NVDA returned ~$650, this will fire:
# result.signals["historical_inconsistency"].fired == True
# result.signals["historical_inconsistency"].detail == "Field 'price': ..."
print(result.verdict) # "flag" or "block" depending on other signals
VerificationResult
Every call to engine.verify() returns a VerificationResult:
| Field | Type | Description |
|---|---|---|
verdict | str | "accept", "flag", or "block" |
confidence | float | P(hallucination) in [0.0, 1.0] |
signals | dict[str, SignalResult] | Per-signal details |
tier_reached | VerificationTier | TIER_0 or TIER_1 |
explanation | str | Human-readable summary |
prior | float | Prior P(H) used |
posterior | float | Final P(H) after all signals |
is_hallucinated | bool | True when verdict is “block” (backward compat) |
Adaptive Thresholds
The engine learns per-tool thresholds from feedback using Exponential Moving Average (EMA) updates. As you provide labelled feedback, thresholds adapt to each tool’s actual hallucination rate:
# After reviewing a result, provide feedback
engine.record_feedback("search_web",
confidence_score=0.35,
was_hallucination=True, # This was actually hallucinated
)
# The engine will automatically:
# 1. Update the EMA hallucination rate for search_web
# 2. Lower the blocking threshold (be stricter)
# 3. Adjust per-tool prior for future verifications
See the Calibration guide for the complete tuning workflow.
Edit this page on GitHub