Budget Enforcement¶

The Problem¶

AI agent costs are notoriously hard to predict. A single prompt change, retry loop, or model escalation can cause an agent to call a paid API 10× more than intended. Without spending limits, you can wake up to an unexpected bill.

agentguard's TokenBudget / BudgetConfig enforces hard spending limits at the tool level. It is designed to work alongside tool-call validation and response verification, so the same runtime that catches broken tool execution also stops runaway spend.

For LLM API calls, agentguard now also supports real spend tracking by wrapping supported provider clients, reading provider-reported usage from responses, and resolving pricing through LiteLLM when available.

How It Works¶

The budget enforcer tracks:

Per-call cost — how much this single call costs
Session cost — cumulative cost since the session started
Session call count — number of calls in this session

When any limit is approached (configurable alert threshold) or exceeded, the enforcer takes the configured action (block, warn, or log).

Basic Usage¶

Using `TokenBudget` (convenience wrapper)¶

from agentguard import guard, TokenBudget

@guard(budget=TokenBudget(
    max_cost_per_session=5.00,      # Stop after $5 total
    max_calls_per_session=100,      # Stop after 100 calls
    alert_threshold=0.80,           # Warn at 80% usage
).config)
def call_openai(prompt: str) -> str:
    """Call OpenAI — costs money."""
    import openai
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

Using `BudgetConfig` directly¶

from agentguard import guard
from agentguard.core.types import BudgetConfig, GuardAction

@guard(budget=BudgetConfig(
    max_cost_per_call=0.50,         # Each call costs at most $0.50
    max_cost_per_session=10.00,     # Total session budget: $10
    max_calls_per_session=200,      # Max 200 calls per session
    alert_threshold=0.75,           # Alert at 75% of any limit
    on_exceed=GuardAction.BLOCK,    # Block when exceeded
    cost_per_call=0.01,             # Explicit fallback if dynamic pricing unavailable
    use_dynamic_llm_costs=True,     # Enable response-based LLM pricing
))
def my_expensive_tool(data: str) -> dict: ...

Configuration Reference¶

Field	Type	Default	Description
`max_cost_per_call`	`float \\| None`	`None`	Max cost per single call. `None` = unlimited
`max_cost_per_session`	`float \\| None`	`None`	Max cumulative cost per session. `None` = unlimited
`max_calls_per_session`	`int \\| None`	`None`	Max call count per session. `None` = unlimited
`alert_threshold`	`float`	`0.80`	Fraction of limit at which to emit a warning (0–1)
`on_exceed`	`GuardAction`	`BLOCK`	Action when limit exceeded: `BLOCK`, `WARN`, or `LOG`
`cost_per_call`	`float \\| None`	`None`	Explicit fixed fallback cost when dynamic pricing cannot produce a known price
`use_dynamic_llm_costs`	`bool`	`True`	Enable provider-response-based LLM cost tracking
`model_pricing_overrides`	`dict[str, tuple[float, float]] \\| None`	`None`	Per-model input/output pricing overrides in dollars per 1M tokens
`record_llm_spend`	`bool`	`True`	Emit LLM spend metadata into traces, telemetry, and reports
`cost_ledger`	`CostLedger \\| None`	`None`	Optional ledger for persisting spend events beyond in-memory session accounting

Real LLM Cost Tracking¶

For provider-backed LLM calls, prefer wrapping the client instead of manually mutating result.cost in an after_call hook. The wrapper reads the provider's usage payload, resolves pricing, records spend once, and returns the native SDK response unchanged. This gives you a cleaner story in production: budget control for model calls, plus guarded execution for tool calls, in the same library.

import os
from openai import OpenAI
from agentguard import InMemoryCostLedger, TokenBudget
from agentguard.integrations import guard_openai_client

budget = TokenBudget(
    max_cost_per_session=5.00,
    max_calls_per_session=100,
)
ledger = InMemoryCostLedger()
budget.config.cost_ledger = ledger

client = guard_openai_client(
    OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
    budget=budget,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarise this document"}],
)

print(budget.session_spend)
print(ledger.query(model="gpt-4o"))

Pricing resolution order:

model_pricing_overrides
LiteLLM model pricing
explicit cost_per_call
otherwise usage is tracked and cost remains unknown

Install LiteLLM support with:

pip install awesome-agentguard[costs]

Session Management¶

Budget tracking is per-session. Sessions are identified by session_id in the GuardConfig:

from agentguard import guard, GuardConfig
from agentguard.core.types import BudgetConfig
import uuid

def create_agent_session() -> str:
    session_id = str(uuid.uuid4())
    return session_id

# Each user gets their own budget
def run_agent_for_user(user_id: str) -> None:
    config = GuardConfig(
        budget=BudgetConfig(
            max_cost_per_session=1.00,
            max_calls_per_session=50,
        ),
        session_id=f"user:{user_id}",
        record=True,
    )
    from agentguard.core.guard import GuardedTool
    search = GuardedTool(search_fn, config=config)
    # This user's calls are tracked independently
    result = search(query="Python tutorials")

Without session_id, all calls to the same GuardedTool instance share a budget counter.

Handling Budget Exceeded¶

from agentguard.core.types import BudgetExceededError

try:
    result = call_openai("Summarise this 10,000 word document")
except BudgetExceededError as e:
    print(f"Budget exceeded: spent ${e.spent:.4f} of ${e.limit:.4f}")
    return {"error": "Budget exceeded — try a shorter query"}

Alert Threshold¶

The alert_threshold triggers a warning before the budget is exceeded. This lets you monitor spending without hard-blocking calls:

from agentguard.core.types import BudgetConfig

config = BudgetConfig(
    max_cost_per_session=10.00,
    alert_threshold=0.80,   # Warning at $8.00
)

When the alert fires, you'll see a log warning:

WARNING: Budget alert: $8.12 of $10.00 session budget used.

Common Patterns¶

Hard limit with soft alert¶

BudgetConfig(
    max_cost_per_session=10.00,
    alert_threshold=0.80,           # Warn at $8
    on_exceed=GuardAction.BLOCK,    # Block at $10
)

Soft limit (log and continue)¶

Useful for auditing without hard enforcement:

BudgetConfig(
    max_cost_per_session=10.00,
    on_exceed=GuardAction.LOG,  # Record but don't block
)

Per-call limit for expensive single calls¶

BudgetConfig(
    max_cost_per_call=0.50,   # Each call can't cost more than $0.50
    cost_per_call=None,       # Prefer real usage-based pricing
    use_dynamic_llm_costs=True,
)

Daily budget reset¶

Create a new GuardedTool instance each day, or use a session_id with a date component:

from datetime import date
from agentguard import GuardConfig
from agentguard.core.types import BudgetConfig

daily_config = GuardConfig(
    budget=BudgetConfig(max_cost_per_session=50.00),
    session_id=f"daily:{date.today().isoformat()}",
)

Troubleshooting¶

Budget resets unexpectedly¶

Budgets are tracked in-memory per GuardedTool instance. If you restart your process or create a new instance, the budget resets. For persistent budgets across restarts, persist the spend in your own storage and check it in a before_call hook:

def check_persistent_budget(call: ToolCall) -> None:
    spent = redis.get(f"budget:{call.session_id}") or 0.0
    if float(spent) >= MAX_BUDGET:
        raise BudgetExceededError(call.tool_name, float(spent), MAX_BUDGET)

@guard(before_call=check_persistent_budget)
def expensive_tool(): ...

`BudgetExceededError` not raised¶

Check that on_exceed is set to GuardAction.BLOCK. With WARN or LOG, the call proceeds.

Cost not tracked¶

For tool calls, cost_per_call remains the fixed-cost mechanism. For LLM API calls, wrap the provider client with guard_openai_client, guard_anthropic_client, or guard_openai_compatible_client. If the response includes usage but pricing cannot be resolved, agentguard records usage and marks cost as unknown unless you explicitly configured cost_per_call as a fallback.