Budget Enforcement

Cap costs and call counts per session, with real response-based LLM cost tracking for production agents.

Overview

Budget enforcement caps costs per session to prevent runaway spending. Define maximum call counts, maximum spend amounts, and choose between blocking or warning when limits are approached. For supported LLM providers, agentguard can also read real usage from API responses and price it dynamically. This pairs naturally with agentguard's tool-call validation and response verification, so the same runtime controls both spend and tool reliability.

Configuration

python
from agentguard import guard
from agentguard.core.types import BudgetConfig, GuardAction

@guard(
    budget=BudgetConfig(
        max_cost_per_session=5.00,    # $5 max per session
        max_calls_per_session=100,    # 100 calls max
        cost_per_call=0.03,           # Explicit fallback cost
        on_exceed=GuardAction.BLOCK,
        alert_threshold=0.8,
        use_dynamic_llm_costs=True,
    )
)
def gpt4_call(prompt: str) -> str:
    return openai.chat.completions.create(
        model="gpt-4",
        messages=[{{"role": "user", "content": prompt}}]
    ).choices[0].message.content

Configuration Fields

FieldTypeDefaultDescription
max_cost_per_sessionfloatNoneMaximum spend in dollars per session
max_calls_per_sessionintNoneMaximum number of calls per session
cost_per_callfloat | NoneNoneExplicit fallback cost when dynamic pricing cannot resolve a known price
on_exceedGuardActionBLOCKWhether to block, warn, or log when a limit is exceeded
alert_thresholdfloat0.8Percentage threshold for warning (0.0–1.0)
use_dynamic_llm_costsboolTrueEnable response-based LLM spend tracking
model_pricing_overridesdict | NoneNonePer-model input/output pricing overrides in dollars per 1M tokens

Real LLM Cost Tracking

For OpenAI, Anthropic, and OpenAI-compatible providers, wrap the client instead of manually computing cost in callback code. The wrapper reads provider usage, resolves pricing through LiteLLM when available, and records spend into the active budget.

python
from openai import OpenAI
from agentguard import InMemoryCostLedger, TokenBudget
from agentguard.integrations import guard_openai_client

budget = TokenBudget(max_cost_per_session=10.00)
budget.config.cost_ledger = InMemoryCostLedger()

client = guard_openai_client(
    OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
    budget=budget,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{{"role": "user", "content": "Summarise this page"}}],
)

Pricing resolution order is:

  1. model_pricing_overrides
  2. LiteLLM pricing
  3. explicit cost_per_call fallback
  4. otherwise usage tracked and cost marked unknown

Monitoring Budget Usage

python
# Check current budget state
print(budget.session_spend)   # 2.34
print(budget.session_calls)   # 78

stats = budget.stats()
print(stats.budget_utilisation)  # 0.468 (46.8%)
print(stats.calls_remaining)     # 22

# Reset budget (e.g., new session)
budget.reset()
⚠ Budget enforcement is per-process

Budget state is tracked in memory. If you run multiple processes, each has its own budget counter. Use a cost ledger if you want to retain spend events for reporting, or shared state middleware if you need cross-process enforcement.

Edit this page on GitHub