Hallucination Detection
Multi-signal analysis to catch when LLMs fabricate tool call results.
The Problem
When an LLM fabricates a tool call result instead of actually executing the tool, it produces output instantly (no real I/O), often misses fields a real API would return, and generates suspiciously "clean" data. These hallucinated results can silently propagate through your agent's reasoning chain.
In testing across major LLMs, tool call hallucination rates range from 5-15% depending on the model and prompt complexity. Even GPT-4 occasionally fabricates plausible-looking API responses.
How Detection Works
agentguard uses a multi-signal approach. Each signal produces a confidence score, and the final hallucination verdict is a weighted combination.
Signal 1: Latency Analysis
Real API calls take 100ms–10s. If a "database query" returns in 0.3ms, it's fabricated. agentguard measures wall-clock execution time and compares it to the expected range.
Signal 2: Required Fields
If get_weather should return {temperature, humidity, conditions} but the response only has {temperature}, something's wrong. You define the schema, agentguard checks it.
Signal 3: Response Patterns
Regex patterns that real responses match — timestamps in ISO format, UUIDs, specific status codes. Fabricated data often misses these patterns.
Signal 4: Forbidden Patterns
Patterns that indicate fabrication — overly round numbers, placeholder-style data like "John Doe" or "123 Main St".
Basic Usage
from agentguard import guard
# Basic — just enable it
@guard(detect_hallucination=True)
def get_weather(city: str) -> dict:
return requests.get(f"https://api.weather.com/{{city}}").json()
Advanced: Hallucination Profiles
Register a profile for precise detection tailored to each tool's expected behavior:
from agentguard import guard
@guard(detect_hallucination=True)
def get_weather(city: str) -> dict:
return requests.get(f"https://api.weather.com/{{city}}").json()
# Register a hallucination profile for precise detection
get_weather.register_hallucination_profile(
expected_latency_ms=(100, 5000),
required_fields=["temperature", "humidity", "conditions"],
response_patterns=[r'"temperature":\s*-?\d'],
forbidden_patterns=[r'"temperature":\s*72\b'], # Suspiciously common default
)
Standalone Detector
Use the detector without the decorator for manual verification:
from agentguard import HallucinationDetector
detector = HallucinationDetector(threshold=0.5)
detector.register_tool("get_weather",
expected_latency_ms=(100, 5000),
required_fields=["temperature", "humidity"],
)
result = detector.verify("get_weather",
execution_time_ms=0.3,
response={{"temperature": 72}}
)
print(result.is_hallucinated) # True
print(result.confidence) # 0.85
print(result.signals) # ["latency_anomaly", "missing_fields"]
Confidence Scoring
| Confidence | Interpretation | Default Action |
|---|---|---|
0.0 – 0.3 | Likely real | Pass through |
0.3 – 0.5 | Uncertain | Log warning |
0.5 – 0.8 | Probably fabricated | Retry the call |
0.8 – 1.0 | Almost certainly fabricated | Raise HallucinationError |
Start with the default threshold of 0.5 and adjust based on your use case. High-stakes applications (financial, medical) should use a lower threshold like 0.3.