Budget Enforcement

Cap costs and call counts per session, with real response-based LLM cost tracking for production agents.

Overview

Budget enforcement caps costs per session to prevent runaway spending. Define maximum call counts, maximum spend amounts, and choose between blocking or warning when limits are approached. For supported LLM providers, agentguard can also read real usage from API responses and price it dynamically. This pairs naturally with agentguard's tool-call validation and response verification, so the same runtime controls both spend and tool reliability.

Configuration

python

from agentguard import guard
from agentguard.core.types import BudgetConfig, GuardAction

@guard(
    budget=BudgetConfig(
        max_cost_per_session=5.00,    # $5 max per session
        max_calls_per_session=100,    # 100 calls max
        cost_per_call=0.03,           # Explicit fallback cost
        on_exceed=GuardAction.BLOCK,
        alert_threshold=0.8,
        use_dynamic_llm_costs=True,
    )
)
def gpt4_call(prompt: str) -> str:
    return openai.chat.completions.create(
        model="gpt-4",
        messages=[{{"role": "user", "content": prompt}}]
    ).choices[0].message.content

Configuration Fields

Field	Type	Default	Description
`max_cost_per_session`	`float`	`None`	Maximum spend in dollars per session
`max_calls_per_session`	`int`	`None`	Maximum number of calls per session
`cost_per_call`	`float \| None`	`None`	Explicit fallback cost when dynamic pricing cannot resolve a known price
`on_exceed`	`GuardAction`	`BLOCK`	Whether to block, warn, or log when a limit is exceeded
`alert_threshold`	`float`	`0.8`	Percentage threshold for warning (0.0–1.0)
`use_dynamic_llm_costs`	`bool`	`True`	Enable response-based LLM spend tracking
`model_pricing_overrides`	`dict \| None`	`None`	Per-model input/output pricing overrides in dollars per 1M tokens

Real LLM Cost Tracking

For OpenAI, Anthropic, and OpenAI-compatible providers, wrap the client instead of manually computing cost in callback code. The wrapper reads provider usage, resolves pricing through LiteLLM when available, and records spend into the active budget.

python

from openai import OpenAI
from agentguard import InMemoryCostLedger, TokenBudget
from agentguard.integrations import guard_openai_client

budget = TokenBudget(max_cost_per_session=10.00)
budget.config.cost_ledger = InMemoryCostLedger()

client = guard_openai_client(
    OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
    budget=budget,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{{"role": "user", "content": "Summarise this page"}}],
)

Pricing resolution order is:

model_pricing_overrides
LiteLLM pricing
explicit cost_per_call fallback
otherwise usage tracked and cost marked unknown

Monitoring Budget Usage

python

# Check current budget state
print(budget.session_spend)   # 2.34
print(budget.session_calls)   # 78

stats = budget.stats()
print(stats.budget_utilisation)  # 0.468 (46.8%)
print(stats.calls_remaining)     # 22

# Reset budget (e.g., new session)
budget.reset()

⚠ Budget enforcement is per-process

Budget state is tracked in memory. If you run multiple processes, each has its own budget counter. Use a cost ledger if you want to retain spend events for reporting, or shared state middleware if you need cross-process enforcement.

← Rate Limiting

OpenAI Integration →

Edit this page on GitHub