Hallucination Detection

Multi-signal analysis to catch when LLMs fabricate tool call results.

The Problem

When an LLM fabricates a tool call result instead of actually executing the tool, it produces output instantly (no real I/O), often misses fields a real API would return, and generates suspiciously "clean" data. These hallucinated results can silently propagate through your agent's reasoning chain.

⚠ This is more common than you think

In testing across major LLMs, tool call hallucination rates range from 5-15% depending on the model and prompt complexity. Even GPT-4 occasionally fabricates plausible-looking API responses.

How Detection Works

agentguard uses a multi-signal approach. Each signal produces a confidence score, and the final hallucination verdict is a weighted combination.

Signal 1: Latency Analysis

Real API calls take 100ms–10s. If a "database query" returns in 0.3ms, it's fabricated. agentguard measures wall-clock execution time and compares it to the expected range.

Signal 2: Required Fields

If get_weather should return {temperature, humidity, conditions} but the response only has {temperature}, something's wrong. You define the schema, agentguard checks it.

Signal 3: Response Patterns

Regex patterns that real responses match — timestamps in ISO format, UUIDs, specific status codes. Fabricated data often misses these patterns.

Signal 4: Forbidden Patterns

Patterns that indicate fabrication — overly round numbers, placeholder-style data like "John Doe" or "123 Main St".

Basic Usage

python
from agentguard import guard

# Basic — just enable it
@guard(detect_hallucination=True)
def get_weather(city: str) -> dict:
    return requests.get(f"https://api.weather.com/{{city}}").json()

Advanced: Hallucination Profiles

Register a profile for precise detection tailored to each tool's expected behavior:

python
from agentguard import guard

@guard(detect_hallucination=True)
def get_weather(city: str) -> dict:
    return requests.get(f"https://api.weather.com/{{city}}").json()

# Register a hallucination profile for precise detection
get_weather.register_hallucination_profile(
    expected_latency_ms=(100, 5000),
    required_fields=["temperature", "humidity", "conditions"],
    response_patterns=[r'"temperature":\s*-?\d'],
    forbidden_patterns=[r'"temperature":\s*72\b'],  # Suspiciously common default
)

Standalone Detector

Use the detector without the decorator for manual verification:

python
from agentguard import HallucinationDetector

detector = HallucinationDetector(threshold=0.5)
detector.register_tool("get_weather",
    expected_latency_ms=(100, 5000),
    required_fields=["temperature", "humidity"],
)

result = detector.verify("get_weather",
    execution_time_ms=0.3,
    response={{"temperature": 72}}
)
print(result.is_hallucinated)  # True
print(result.confidence)        # 0.85
print(result.signals)           # ["latency_anomaly", "missing_fields"]

Confidence Scoring

ConfidenceInterpretationDefault Action
0.0 – 0.3Likely realPass through
0.3 – 0.5UncertainLog warning
0.5 – 0.8Probably fabricatedRetry the call
0.8 – 1.0Almost certainly fabricatedRaise HallucinationError
✅ Tip

Start with the default threshold of 0.5 and adjust based on your use case. High-stakes applications (financial, medical) should use a lower threshold like 0.3.

Edit this page on GitHub