Documentation Index
Fetch the complete documentation index at: https://docs.sirenspec.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The workflow-levelbudget: block sets cumulative ceilings on a run. The executor checks the running totals after every node and enforces the limits you declare. At least one of max_tokens, max_cost_usd, or max_duration_s must be set — an empty budget: block is rejected at validation time.
max_tokens_per_call ceiling: max_tokens_per_call bounds a single response, while budget: bounds the whole run.
Fields
| Field | Required | Type | Default | Description |
|---|---|---|---|---|
max_tokens | One of the three | integer ≥ 1 | none | Maximum total tokens across all nodes in a run. |
max_cost_usd | One of the three | number > 0 | none | Maximum estimated USD spend. Falls back to unenforced for models without pricing entries (e.g. Ollama/local models, where the estimate is None). |
max_duration_s | One of the three | number > 0 | none | Maximum wall-clock seconds for the full run. |
on_exceeded | No | "abort" | "warn" | "skip_remaining" | "abort" | Action when any ceiling is hit (see below). |
on_exceeded actions
| Action | Behaviour when a ceiling is hit |
|---|---|
abort (default) | Raises BudgetExceededError and stops the run immediately. |
warn | Logs a structured warning and lets execution continue. |
skip_remaining | Marks all remaining nodes as skipped (no further LLM calls) and finishes with a successful status. Skipped nodes carry skip_reason: "budget_exceeded" in the trace. |
BudgetExceededError is exported from the top-level package and carries the violation reason, tokens used, and the USD estimate at the point of the violation:
Cost estimation
USD estimates come from a bundled LiteLLM pricing snapshot, refreshed from a local cache when available. Models without a pricing entry (Ollama and other local backends) reportestimated_usd: null, and a max_cost_usd ceiling is effectively unenforced for those models — use max_tokens or max_duration_s instead when running locally.
Budget status in the trace
When abudget: block is configured, the trace summary includes a budget object:
Reading budget state in when: expressions
Budget state is also exposed to edge conditions as _budget, so you can route around expensive work as the run approaches its ceiling:
_budget exposes total_tokens (int) and estimated_usd (float or None). See YAML Reference → when expressions.
When to use a budget
Use a budget when:- Running untrusted or open-ended input where token usage is hard to predict.
- Enforcing a hard cost ceiling per run in production.
- Bounding wall-clock time for latency-sensitive integrations.
- Per-node
max_tokens_per_callto bound individual responses. - The
cost_capguardrail for agent-level enforcement.
Cookbook recipe
- Budget Guarded — a workflow that aborts when its USD ceiling is reached.