Calibration

Tune the VerificationEngine's sensitivity to match your deployment environment.

Why Calibrate?

Every deployment is different. A tool that hallucination-prone in one environment may be perfectly reliable in another. Calibration lets you tune the trade-off between false positives (blocking legitimate results) and false negatives (missing actual hallucinations).

The calibrate() API

Use engine.calibrate() to tune detection sensitivity globally or per-tool:

python
from agentguard.verification import VerificationEngine

engine = VerificationEngine()

# Make a specific tool stricter (catches more hallucinations)
engine.calibrate("search_web",
    accept_threshold=0.1,   # Lower = stricter acceptance
    block_threshold=0.3,    # Lower = more blocking
    prior=0.25,             # Higher base hallucination rate
)

# Make another tool more lenient (fewer false positives)
engine.calibrate("get_time",
    accept_threshold=0.4,
    block_threshold=0.8,
    prior=0.05,
)

# Tune global settings (all tools without per-tool overrides)
engine.calibrate(
    accept_threshold=0.15,
    block_threshold=0.45,
)

calibrate() Parameters

ParameterTypeDefaultDescription
tool_namestr | NoneNoneTool to calibrate. None for global settings.
accept_thresholdfloat | None0.2P(H) below this → auto-accept
block_thresholdfloat | None0.5P(H) above this → auto-block
priorfloat | None0.15Base P(hallucination) for this tool
likelihood_ratiosdict | NoneNonePer-signal LR overrides

Tuning Signal Weights

You can adjust how much each signal contributes to the posterior by overriding its likelihood ratio:

python
# Make schema violations much more damning
engine.calibrate("search_web",
    likelihood_ratios={{
        "schema_mismatch": 20.0,    # Was 12.0
        "latency_anomaly": 2.0,     # Was 3.5 (reduce weight)
    }},
)

# Globally increase weight of session inconsistency
engine.calibrate(
    likelihood_ratios={{
        "session_inconsistency": 8.0,  # Was 4.0
    }},
)

Continuous Learning with record_feedback()

The engine adapts over time when you provide labelled feedback:

python
# After human review or automated validation:
engine.record_feedback("search_web",
    confidence_score=0.3,
    was_hallucination=True,
)

engine.record_feedback("search_web",
    confidence_score=0.1,
    was_hallucination=False,
)

# The engine uses EMA (alpha=0.1) to update:
# - Per-tool hallucination rate (affects prior)
# - Per-tool blocking threshold (adapts to actual rate)
EMA learning rate

The default EMA alpha is 0.1, meaning each feedback sample contributes 10% to the updated estimate. This provides smooth adaptation without overreacting to individual samples.

Inspecting Calibration

python
# Check current settings for a tool
cal = engine.get_calibration("search_web")
print(cal)
# {{
#   'accept_threshold': 0.1,
#   'block_threshold': 0.3,
#   'prior': 0.25,
#   'tool_name': 'search_web',
#   'tool_threshold': 0.3,
#   'hallucination_rate': 0.22,
#   'total_feedback': 47,
#   'baseline_calls': 1205,
#   'baseline_latency': {{'count': 1205, 'mean': 342.5, 'std': 89.3, ...}}
# }}

# Check global settings
global_cal = engine.get_calibration()
print(global_cal)
# {{'accept_threshold': 0.15, 'block_threshold': 0.45, 'prior': 0.15}}

Recommended Calibration Workflow

  1. Start with defaults. Deploy the engine with default settings (prior=0.15, accept_threshold=0.2, block_threshold=0.5).
  2. Monitor for 24–48 hours. Log all verdicts and manually review flagged/blocked results.
  3. Record feedback. Call record_feedback() for every reviewed result.
  4. Check calibration. Use get_calibration() to see how thresholds have adapted.
  5. Fine-tune. If a tool has too many false positives, increase its accept_threshold or lower its prior. If it misses hallucinations, do the opposite.
  6. Iterate. Repeat steps 2–5 as traffic patterns change.
Edit this page on GitHub