Budget Enforcement
Cap costs and call counts per session, with real response-based LLM cost tracking for production agents.
Overview
Budget enforcement caps costs per session to prevent runaway spending. Define maximum call counts, maximum spend amounts, and choose between blocking or warning when limits are approached. For supported LLM providers, agentguard can also read real usage from API responses and price it dynamically. This pairs naturally with agentguard's tool-call validation and response verification, so the same runtime controls both spend and tool reliability.
Configuration
from agentguard import guard
from agentguard.core.types import BudgetConfig, GuardAction
@guard(
budget=BudgetConfig(
max_cost_per_session=5.00, # $5 max per session
max_calls_per_session=100, # 100 calls max
cost_per_call=0.03, # Explicit fallback cost
on_exceed=GuardAction.BLOCK,
alert_threshold=0.8,
use_dynamic_llm_costs=True,
)
)
def gpt4_call(prompt: str) -> str:
return openai.chat.completions.create(
model="gpt-4",
messages=[{{"role": "user", "content": prompt}}]
).choices[0].message.content
Configuration Fields
| Field | Type | Default | Description |
|---|---|---|---|
max_cost_per_session | float | None | Maximum spend in dollars per session |
max_calls_per_session | int | None | Maximum number of calls per session |
cost_per_call | float | None | None | Explicit fallback cost when dynamic pricing cannot resolve a known price |
on_exceed | GuardAction | BLOCK | Whether to block, warn, or log when a limit is exceeded |
alert_threshold | float | 0.8 | Percentage threshold for warning (0.0–1.0) |
use_dynamic_llm_costs | bool | True | Enable response-based LLM spend tracking |
model_pricing_overrides | dict | None | None | Per-model input/output pricing overrides in dollars per 1M tokens |
Real LLM Cost Tracking
For OpenAI, Anthropic, and OpenAI-compatible providers, wrap the client instead of manually computing cost in callback code. The wrapper reads provider usage, resolves pricing through LiteLLM when available, and records spend into the active budget.
from openai import OpenAI
from agentguard import InMemoryCostLedger, TokenBudget
from agentguard.integrations import guard_openai_client
budget = TokenBudget(max_cost_per_session=10.00)
budget.config.cost_ledger = InMemoryCostLedger()
client = guard_openai_client(
OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
budget=budget,
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{{"role": "user", "content": "Summarise this page"}}],
)
Pricing resolution order is:
model_pricing_overrides- LiteLLM pricing
- explicit
cost_per_callfallback - otherwise usage tracked and cost marked unknown
Monitoring Budget Usage
# Check current budget state
print(budget.session_spend) # 2.34
print(budget.session_calls) # 78
stats = budget.stats()
print(stats.budget_utilisation) # 0.468 (46.8%)
print(stats.calls_remaining) # 22
# Reset budget (e.g., new session)
budget.reset()
Budget state is tracked in memory. If you run multiple processes, each has its own budget counter. Use a cost ledger if you want to retain spend events for reporting, or shared state middleware if you need cross-process enforcement.