Rate Limiting

Token bucket rate limiting with per-second, per-minute, and per-hour controls.

Token Bucket Algorithm

agentguard uses the token bucket algorithm for rate limiting. Think of a bucket that holds tokens — each call consumes one token, and tokens refill at a steady rate. Burst traffic is allowed up to the bucket capacity.

How It Works

Bucket starts full (capacity = burst allowance)
Each call removes 1 token
Tokens refill at a constant rate
If bucket is empty, the call is blocked
Bucket never exceeds its capacity

Configuration

python

from agentguard import guard
from agentguard.config import RateLimitConfig

# Per-minute rate limit
@guard(
    rate_limit=RateLimitConfig(
        calls_per_minute=60,    # 60 calls per minute = 1/sec sustained
        burst=10,               # Allow burst of 10 rapid calls
    )
)
def search(query: str) -> dict:
    return api.search(query)

# Per-second rate limit (strict)
@guard(
    rate_limit=RateLimitConfig(
        calls_per_second=5,     # Max 5 calls per second
        burst=5,                # No burst beyond rate
    )
)
def write_db(data: dict) -> bool:
    return db.insert(data)

# Per-hour rate limit (cost control)
@guard(
    rate_limit=RateLimitConfig(
        calls_per_hour=1000,    # 1000 calls per hour
        burst=50,               # Allow short bursts
    )
)
def llm_call(prompt: str) -> str:
    return openai.chat(prompt)

Configuration Fields

Field	Type	Default	Description
`calls_per_second`	`float`	`None`	Maximum sustained calls per second
`calls_per_minute`	`float`	`None`	Maximum sustained calls per minute
`calls_per_hour`	`float`	`None`	Maximum sustained calls per hour
`burst`	`int`	`10`	Maximum burst size (bucket capacity)
`block`	`bool`	`True`	If True, raise error. If False, wait.

Handling Rate Limit Errors

python

from agentguard.errors import RateLimitExceeded

try:
    result = search("hello")
except RateLimitExceeded as e:
    print(f"Rate limited. Retry after {{e.retry_after:.1f}}s")
    # e.retry_after gives you the wait time in seconds

✅ Tip: Use burst wisely

Set burst equal to your rate for no burst allowance, or higher to accommodate natural traffic spikes. For interactive agents, a burst of 2-3x the sustained rate works well.

← Circuit Breakers

Budget Enforcement →

Edit this page on GitHub