Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sirenspec.dev/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The workflow-level budget: block sets cumulative ceilings on a run. The executor checks the running totals after every node and enforces the limits you declare. At least one of max_tokens, max_cost_usd, or max_duration_s must be set — an empty budget: block is rejected at validation time.
budget:
  max_tokens: 50000        # total tokens across all nodes
  max_cost_usd: 5.00       # estimated USD ceiling for the whole run
  max_duration_s: 300      # wall-clock cap for the whole run
  on_exceeded: abort       # abort | warn | skip_remaining
This complements the per-node max_tokens_per_call ceiling: max_tokens_per_call bounds a single response, while budget: bounds the whole run.

Fields

FieldRequiredTypeDefaultDescription
max_tokensOne of the threeinteger ≥ 1noneMaximum total tokens across all nodes in a run.
max_cost_usdOne of the threenumber > 0noneMaximum estimated USD spend. Falls back to unenforced for models without pricing entries (e.g. Ollama/local models, where the estimate is None).
max_duration_sOne of the threenumber > 0noneMaximum wall-clock seconds for the full run.
on_exceededNo"abort" | "warn" | "skip_remaining""abort"Action when any ceiling is hit (see below).

on_exceeded actions

ActionBehaviour when a ceiling is hit
abort (default)Raises BudgetExceededError and stops the run immediately.
warnLogs a structured warning and lets execution continue.
skip_remainingMarks all remaining nodes as skipped (no further LLM calls) and finishes with a successful status. Skipped nodes carry skip_reason: "budget_exceeded" in the trace.
BudgetExceededError is exported from the top-level package and carries the violation reason, tokens used, and the USD estimate at the point of the violation:
from sirenspec import BudgetExceededError

try:
    trace = await execute(workflow, user_input)
except BudgetExceededError as exc:
    print(f"{exc}{exc.tokens_used} tokens, ${exc.estimated_usd}")

Cost estimation

USD estimates come from a bundled LiteLLM pricing snapshot, refreshed from a local cache when available. Models without a pricing entry (Ollama and other local backends) report estimated_usd: null, and a max_cost_usd ceiling is effectively unenforced for those models — use max_tokens or max_duration_s instead when running locally.

Budget status in the trace

When a budget: block is configured, the trace summary includes a budget object:
{
  "summary": {
    "total_tokens": 1832,
    "budget": {
      "max_tokens": 50000,
      "max_cost_usd": 5.0,
      "max_duration_s": 300,
      "on_exceeded": "abort",
      "tokens_used": 1832,
      "estimated_usd": 0.0021,
      "duration_s": 3.514,
      "exceeded": false,
      "violations": [],
      "skipped_remaining": false
    }
  }
}

Reading budget state in when: expressions

Budget state is also exposed to edge conditions as _budget, so you can route around expensive work as the run approaches its ceiling:
edges:
  - from: initial
    to: expensive_processing
    when: _budget.estimated_usd < 2.0
_budget exposes total_tokens (int) and estimated_usd (float or None). See YAML Reference → when expressions.

When to use a budget

Use a budget when:
  • Running untrusted or open-ended input where token usage is hard to predict.
  • Enforcing a hard cost ceiling per run in production.
  • Bounding wall-clock time for latency-sensitive integrations.
Pair it with:

Cookbook recipe

  • Budget Guarded — a workflow that aborts when its USD ceiling is reached.
See the YAML Reference for the full field listing.