Overview
LLM provider APIs fail transiently — rate limits (429), overloaded servers (503), and flaky networks are normal in production. Without retry logic a single transient error silently kills an otherwise-healthy run. SirenSpec’s retry policy system gives you configurable backoff, jitter, and structured failure handling with zero boilerplate.retry block
Add a retry block to any node to override the retry behaviour for that node.
Fields
| Field | Required | Default | Description |
|---|---|---|---|
max_attempts | No | 1 | Total number of attempts, including the first. |
backoff | No | constant | Delay growth strategy: constant, linear, or exponential. |
base_delay | No | 1.0 | Delay in seconds before attempt 2. |
max_delay | No | 60.0 | Maximum delay regardless of backoff math. |
jitter | No | false | When true, adds a random ±20 % offset to each delay. |
on | No | [429, network_error] | Trigger conditions. Integers match HTTP status codes; network_error matches connection failures; guardrail_violation matches an output GuardrailViolation. |
retry_on_guardrail | No | false | When true, output guardrail checks run inside the retry loop so a GuardrailViolation re-runs the LLM call instead of failing immediately. |
Backoff strategies
| Strategy | Delay formula |
|---|---|
constant | base_delay on every retry |
linear | base_delay × attempt |
exponential | base_delay × 2^(attempt-1) |
max_delay.
Retrying on guardrail violations
By default, retries only fire on transient transport errors (HTTP codes and network failures). Output guardrails — such asschema — run after the retry loop, so a malformed-but-successful response fails the run outright.
Set retry_on_guardrail: true to fold output guardrail checks into the retry loop. A GuardrailViolation then counts as a retryable error and triggers another LLM call, giving the model additional chances to produce output that satisfies the guardrail. Add guardrail_violation to the on list to make the trigger explicit.
schema (or other output) guardrail — see Guardrails.
on_failure block
on_failure controls what happens when all retry attempts are exhausted.
Fields
| Field | Required | Default | Description |
|---|---|---|---|
action | No | abort | What to do when retries are exhausted. |
fallback_node | No | — | Node ID to route to when action: fallback. |
default_output | No | — | Static value written to the node’s writes path when action: use_default. |
Actions
| Action | Behaviour |
|---|---|
abort | Raises RetryExhaustedError and stops the run immediately. |
fallback | Routes execution to fallback_node. The failed node is marked skipped. |
skip | Silently skips the node; downstream nodes that depend on its output receive nothing. |
use_default | Writes default_output to the node’s writes path and continues normally. |
Workflow-level defaults
Set retry and failure defaults at the top level of your workflow. Every node that does not specify its ownretry or on_failure block inherits these defaults.
retry and on_failure blocks completely override the defaults for that node — they are not merged field-by-field.
Error types
| Exception | When raised |
|---|---|
RetryExhaustedError | All retry attempts failed and on_failure.action is abort (or unset). Subclass of ProviderError. |
RetryExhaustedError carries the node ID, the number of attempts made, and the last upstream exception so you can log and inspect the root cause.
Tracing
Every retry attempt is recorded in the run trace under the node’sretry_attempts list: