Verification Engine

Bayesian multi-signal hallucination detection with calibrated likelihood ratios, SPC baselines, and adaptive thresholds.

Overview

The VerificationEngine is the brain of agentguard v0.2's detection pipeline. It replaces the old HallucinationDetector with a rigorous Bayesian multi-signal fusion architecture that combines multiple zero-cost signals to compute a posterior probability of hallucination.

Instead of hard-coded rules, the engine uses calibrated likelihood ratios and the log-odds form of Bayes' theorem for numerical stability. Every signal either increases or decreases the probability that a tool call result was fabricated.

Powered by research

The VerificationEngine implements techniques from 14 academic papers on tool-call verification, Bayesian inference, and Statistical Process Control. See Section 4.2 and 7.2 of the agentguard research paper.

Tiered Architecture

The engine runs a two-tier pipeline. Every check is zero-cost — no external API calls, no model inference, just math and statistics.

  Tool Call Result + Execution Time
           │
  ┌────────▼─────────────────────────┐
  │  TIER 0: Zero-Cost Pre-Checks   │
  │  ┌─────────────────────────────┐ │
  │  │ Schema validation           │ │
  │  │ Latency plausibility        │ │
  │  │ Pattern matching            │ │
  │  │ Response length bounds      │ │
  │  └─────────────────────────────┘ │
  └────────┬─────────────────────────┘
           │
  ┌────────▼─────────────────────────┐
  │  Bayesian Signal Combiner        │
  │  Prior P(H) + Tier 0 signals     │
  │  → Posterior via log-odds update  │
  └────────┬─────────────────────────┘
           │
     ┌─────▼─────┐
     │ P(H)≥0.5? │──Yes──▶ BLOCK (skip Tier 1)
     └─────┬─────┘
           │ No
  ┌────────▼─────────────────────────┐
  │  TIER 1: Post-Execution Checks   │
  │  ┌─────────────────────────────┐ │
  │  │ SPC baseline anomaly        │ │
  │  │ Session consistency         │ │
  │  │ Cross-session consistency   │ │
  │  │ Value plausibility (SPC)    │ │
  │  └─────────────────────────────┘ │
  └────────┬─────────────────────────┘
           │
  ┌────────▼─────────────────────────┐
  │  Bayesian Update (all signals)   │
  └────────┬─────────────────────────┘
           │
     ┌─────▼──────────────────┐
     │ P(H) < 0.2 → accept   │
     │ 0.2 ≤ P(H) < 0.5 → flag│
     │ P(H) ≥ 0.5 → block    │
     └────────────────────────┘

Quick Start

python
from agentguard.verification import VerificationEngine

engine = VerificationEngine()

# Register a tool profile with expected behaviour
engine.register_tool_profile("get_weather",
    expected_latency_ms=(100, 5000),
    required_fields=["temperature", "humidity"],
    has_network_io=True,
)

# Verify a tool call result
result = engine.verify("get_weather",
    args={{"city": "London"}},
    result={{"temperature": 18, "humidity": 65}},
    execution_time_ms=350.0,
)

print(result.verdict)     # "accept"
print(result.confidence)  # 0.127
print(result.signals)     # {{signal_name: SignalResult(...)}}
print(result.tier_reached) # VerificationTier.TIER_1
Zero config works

If you don’t register a tool profile, the engine uses sensible defaults (2ms–60s latency range, no schema checks). Register profiles for tools where you know the expected behaviour.

Signal Types & Likelihood Ratios

Each signal has a default likelihood ratio (LR) — the ratio P(signal fires | hallucination) / P(signal fires | not hallucination). Higher LR = stronger evidence of hallucination when the signal fires.

SignalDefault LRTierWhat it detects
schema_mismatch12.00Missing required fields or forbidden fields present
pattern_mismatch6.00Response doesn’t match any expected regex patterns
latency_anomaly3.50Impossibly fast execution (<2ms = near-certain fabrication)
length_anomaly2.00Response too short or too long vs. configured bounds
historical_inconsistency4.51Different from historical results for same args
session_inconsistency4.01Contradicts earlier results in the same session
spc_anomaly3.01Statistical outlier vs. baseline (Western Electric rules)
value_plausibility3.01Numeric field values outside mean ± 3σ

How the Bayesian Combiner Works

The engine uses the log-odds form of Bayes' theorem for numerical stability:

python
# Internal log-odds update (simplified)
log_odds = log(prior / (1 - prior))

for signal in signals:
    if signal.fired:
        effective_lr = 1.0 + (signal.likelihood_ratio - 1.0) * signal.score
        log_odds += log(effective_lr)
    else:
        # Signal absent: slight update toward no hallucination
        absent_lr = 1.0 / max(signal.likelihood_ratio * 0.1, 1.01)
        log_odds += log(absent_lr)

posterior = exp(log_odds) / (1 + exp(log_odds))

Signals that don’t fire also provide evidence — their absence slightly reduces the posterior probability of hallucination.

Registering Tool Profiles

Profiles tell the engine what a real tool response looks like:

python
engine = VerificationEngine()

# Network API tool — strict latency and schema checks
engine.register_tool_profile("search_web",
    expected_latency_ms=(100, 5000),
    required_fields=["results", "total_count"],
    forbidden_fields=["error", "mock"],
    response_patterns=[r'"results"\s*:\s*\['],
    min_response_length=50,
    max_response_length=100000,
    has_network_io=True,
)

# Database query — different latency profile
engine.register_tool_profile("query_db",
    expected_latency_ms=(5, 30000),
    required_fields=["rows"],
    has_network_io=True,
)

# Pure computation — very fast is expected
engine.register_tool_profile("calculate",
    expected_latency_ms=(0.1, 100),
    has_network_io=False,  # No network I/O expected
)

ToolProfile Fields

FieldTypeDefaultDescription
expected_latency_ms(float, float)(50, 30000)Min/max plausible latency range
required_fieldslist[str][]Fields that must appear in real responses
forbidden_fieldslist[str][]Fields that should never appear
response_patternslist[str][]Regex patterns that real responses match
min_response_lengthint | NoneNoneMinimum JSON-serialised length
max_response_lengthint | NoneNoneMaximum JSON-serialised length
has_network_ioboolTrueWhether the tool makes network calls

SPC Baselines

The engine maintains Statistical Process Control (SPC) baselines for each tool using Welford’s online algorithm for running mean/variance, and applies the four Western Electric rules:

RuleConditionWeight
Rule 11 point beyond 3σ from mean0.40
Rule 22 of last 3 points beyond 2σ (same side)0.25
Rule 34 of last 5 points beyond 1σ (same side)0.20
Rule 48 consecutive points on same side of mean0.15

SPC checks require at least 8 prior observations before they activate. The baseline tracks latency, response size, field frequency, and per-field numeric ranges — all using RunningStats (Welford’s algorithm with a 100-observation circular buffer).

Cross-Session Consistency

The ConsistencyTracker detects implausible swings in tool outputs:

python
result = engine.verify("get_stock_price",
    args={{"ticker": "NVDA"}},
    result={{"price": 50.0, "currency": "USD"}},
    execution_time_ms=250.0,
    session_id="session-123",  # Enable session consistency
)

# If historical calls for NVDA returned ~$650, this will fire:
# result.signals["historical_inconsistency"].fired == True
# result.signals["historical_inconsistency"].detail == "Field 'price': ..."
print(result.verdict)  # "flag" or "block" depending on other signals

VerificationResult

Every call to engine.verify() returns a VerificationResult:

FieldTypeDescription
verdictstr"accept", "flag", or "block"
confidencefloatP(hallucination) in [0.0, 1.0]
signalsdict[str, SignalResult]Per-signal details
tier_reachedVerificationTierTIER_0 or TIER_1
explanationstrHuman-readable summary
priorfloatPrior P(H) used
posteriorfloatFinal P(H) after all signals
is_hallucinatedboolTrue when verdict is “block” (backward compat)

Adaptive Thresholds

The engine learns per-tool thresholds from feedback using Exponential Moving Average (EMA) updates. As you provide labelled feedback, thresholds adapt to each tool’s actual hallucination rate:

python
# After reviewing a result, provide feedback
engine.record_feedback("search_web",
    confidence_score=0.35,
    was_hallucination=True,   # This was actually hallucinated
)

# The engine will automatically:
# 1. Update the EMA hallucination rate for search_web
# 2. Lower the blocking threshold (be stricter)
# 3. Adjust per-tool prior for future verifications

See the Calibration guide for the complete tuning workflow.

Edit this page on GitHub