Runtime budget control and tool-call reliability for AI agents
Hard spend caps · Real LLM cost tracking · Tool validation · Response verification · Shared multi-agent budgets · Tracing and tests
pip install awesome-agentguard
Quick Start
Start with the two things teams actually need in production: keep runs inside budget and make tool calls trustworthy.
import os
from openai import OpenAI
from agentguard import TokenBudget, guard
from agentguard.integrations import guard_openai_client
# 1. Put a hard cap on model spend
budget = TokenBudget(max_cost_per_session=5.00, max_calls_per_session=100)
client = guard_openai_client(
OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
budget=budget,
)
# 2. Guard the tools your agent depends on
@guard(validate_input=True, verify_response=True, max_retries=2)
def search_web(query: str) -> dict:
return requests.get(f"https://api.search.com?q={query}").json()
Use @guard with no arguments for basic wrapping, then layer in budgets and response profiles as your agent gets more expensive or more critical.
Installation
Install agentguard from PyPI:
pip install awesome-agentguard
Optional Dependencies
pip install awesome-agentguard[all] # OpenAI + Anthropic + LangChain integrations
pip install awesome-agentguard[costs] # LiteLLM-backed real LLM cost tracking
pip install awesome-agentguard[rich] # Rich terminal output
Requirements: Python 3.10+ · Only core dependency: pydantic>=2.0.
Want the full technical documentation, API reference, and deeper guides? Visit the detailed docs at rigvedrs.github.io/agentguard/.
Features
Why This Shape
agentguard is pitched around two problems teams feel first in production: runaway spend and untrustworthy tool execution. The verification engine is a big part of the second problem, and it is backed by a substantial research base.
- Latency-as-proof — What it is: use runtime as a sanity check (a real network/database call can’t consistently finish in ~0–2ms). Why: catches “tool-result hallucinations” early when a tool claims it ran but the timing makes that basically impossible.
- Log-odds Bayesian fusion — What it is: convert each weak signal (latency, schema validity, past accuracy, etc.) into a log-likelihood ratio, then add them up. Why: combining evidence becomes stable and interpretable (each signal contributes a clear “push” toward trust or distrust).
- Western Electric SPC rules — What it is: classic Statistical Process Control heuristics (a small set of rules) for spotting when a metric’s behavior shifts. Why: detects regressions like “this tool suddenly got flaky” without needing a perfect model of the tool.
- Cross-session consistency — What it is: compare today’s tool outputs to historical outputs for the same tool + args pattern. Why: flags surprising deviations (often a bug, stale data, or fabrication) when the “same question” starts returning incompatible answers.
- Adaptive thresholds — What it is: update decision thresholds over time with an Exponential Moving Average (EMA) from real feedback. Why: reduces false alarms and improves detection as your environment changes (new infra, new data, new models).
Framework Support
Works with every major AI framework and provider out of the box, while keeping one consistent runtime layer for both budget control and tool-call safety.
agentguard also supports real response-based LLM cost tracking for OpenAI, Anthropic, and OpenAI-compatible providers via optional LiteLLM-backed pricing.