Guardrails - SirenSpec

Overview

Guardrails run on every node execution — checking input before it reaches the LLM and validating (or transforming) output before it is written to the context. By default, the injection guardrail is active on all agents. You can configure guardrails at the workflow level, per agent, or disable them entirely.

guardrails:        # workflow-level (applies to all agents by default)
  - injection
  - length

Built-in Guardrails

`injection`

Detects common prompt-injection patterns in both input and output text. If an injection signature is detected, the node fails immediately with a GuardrailViolation and the workflow status is set to "failed". Detected patterns include:

ignore previous instructions
disregard your instructions
you are now [role]
forget your instructions
new instructions:
override previous instructions
act as a [role]
pretend you are [role]
your new role is
system: you are

Detection is case-insensitive. Default: Always active unless explicitly overridden with an empty list or a list that omits injection.

`length`

Limits the length of LLM output. In the default "truncate" mode, responses longer than the limit are silently cut and appended with "...".

Parameter	Default	Description
`max_chars`	`4000`	Maximum allowed output length in characters.
`mode`	`truncate`	`"truncate"` appends `"..."` and trims; `"raise"` raises a `GuardrailViolation`.

The length guardrail only checks output — input is passed through unchanged.

`pii`

Detects personally identifiable information in both input and output text and either redacts it, blocks the call, or passes it through with a flag. Supported entities are email, phone, ssn, and credit_card. Credit-card matches are filtered through the Luhn checksum to suppress false positives. Configuration:

Parameter	Type	Default	Description
`entities`	list of strings	all four entities	Subset of `["email", "phone", "ssn", "credit_card"]` to detect.
`action`	string	`"redact"`	`"redact"` replaces each match with `replacement`; `"block"` raises `PIIDetectedError`; `"flag"` leaves the text unchanged.
`replacement`	string	`"[REDACTED]"`	Replacement string used when `action: "redact"`.

Behavior:

Input: Inspected before the prompt is sent to the LLM.
Output: Inspected after the response is received, before downstream nodes see it.
On action: "block": Raises PIIDetectedError (a GuardrailError subclass) listing the entity types that matched.

Example — redact emails and phones in both directions:

guardrails:
  - name: pii
    config:
      entities: ["email", "phone"]
      action: redact
      replacement: "[REDACTED]"

Example — block any credit-card leakage from a finance agent:

agents:
  finance:
    model: "openai:gpt-4o-mini"
    system: "Answer finance questions without ever quoting card numbers."
    guardrails:
      - name: pii
        config:
          entities: ["credit_card"]
          action: block

`schema`

Validates LLM output against a JSON Schema Draft 7 definition. The guardrail parses the output text as JSON and checks it against the provided schema. Input text is passed through unchanged. Use this guardrail when you need the LLM to produce structured JSON output that conforms to a specific schema — for example, extracting data into a fixed set of fields. Configuration: The schema guardrail requires a name and a config dict with a schema key:

guardrails:
  - name: schema
    config:
      schema:
        type: "object"
        properties:
          name:
            type: "string"
          age:
            type: "integer"
            minimum: 0
        required: ["name", "age"]

Behavior:

Input: Passed through unchanged.
Output: Parsed as JSON and validated against the schema.
On failure: Raises a GuardrailViolation with a message indicating the path and constraint that failed (e.g., Schema violation at "$.age": 25 is greater than the maximum of 20).

Pair the schema guardrail with retry.retry_on_guardrail: true so a GuardrailViolation re-runs the LLM call instead of failing the run — giving the model another chance to emit conforming JSON. See Retry Policies.

Example:

agents:
  extractor:
    model: "openai:gpt-4o-mini"
    system: "Extract the person's name and age as JSON: {\"name\": \"...\", \"age\": ...}"
    guardrails:
      - name: schema
        config:
          schema:
            type: "object"
            properties:
              name:
                type: "string"
              age:
                type: "integer"
                minimum: 0
                maximum: 150
            required: ["name", "age"]

`cost_cap`

Enforces token and/or USD budget ceilings on a workflow. After each node executes, the guardrail checks cumulative token usage and estimated cost. On exceedance, it either aborts the workflow or logs a warning. At least one of max_usd or max_tokens must be specified.

Parameter	Type	Default	Description
`max_usd`	float	None	Maximum estimated USD spend for the entire run.
`max_tokens`	int	None	Maximum token ceiling independent of cost.
`action`	string	`"abort"`	`"abort"` raises `BudgetExceededError` and stops execution; `"warn"` logs and continues.

Behavior:

Input/Output: Both are pass-throughs; the guardrail only examines budget state.
Budget checking: Runs after each node completes. Available to when: conditions as _budget.total_tokens and _budget.estimated_usd.
On action: "abort": Raises BudgetExceededError with details of the violation; the workflow transitions to "failed" status.
On action: "warn": Logs a structured warning and allows execution to continue.

Example:

guardrails:
  - name: cost_cap
    config:
      max_usd: 5.0
      max_tokens: 10000
      action: abort

Use in a conditional edge to skip expensive nodes once a budget threshold is reached:

edges:
  - from: classify
    to: expensive_analysis
    when: _budget.estimated_usd < 3.0  # only run if we have budget left

  - from: expensive_analysis
    to: end
    when: true

  - from: classify
    to: cheap_fallback
    when: _budget.estimated_usd >= 3.0  # skip expensive path when budget low

  - from: cheap_fallback
    to: end
    when: true

The cost_cap guardrail works best when pricing information is available for your LLM models. If no pricing is found, estimated_usd will be None and only the max_tokens ceiling will be enforced.

Configuration

Workflow-level (default for all agents)

guardrails:
  - injection
  - length

If guardrails is omitted from the workflow file, only injection is active.

Per-agent override

An agent’s guardrails field completely replaces the workflow-level list for that agent:

guardrails:
  - injection
  - length

agents:
  summarizer:
    model: "openai:gpt-4o-mini"
    system: "Summarise the text."
    guardrails: ["length"]   # injection disabled for this agent only

  responder:
    model: "openai:gpt-4o-mini"
    system: "You are a support agent."
    # no override — inherits [injection, length] from workflow level

Disabling all guardrails

Set an empty list to disable all guardrails for the workflow or a specific agent:

# Disable for the entire workflow
guardrails: []

# Disable for one agent
agents:
  internal_tool:
    model: "openai:gpt-4o-mini"
    system: "Internal tool with no user-facing output."
    guardrails: []

Disabling the injection guardrail removes protection against prompt-injection attacks. Only do this for agents that process fully trusted input.

Execution Trace

Guardrails that pass are recorded in each node’s trace entry:

{
  "id": "answer",
  "guardrails_passed": [
    "InjectionGuardrail.check_input",
    "InjectionGuardrail.check_output",
    "LengthGuardrail.check_output"
  ],
  "error": null
}

A GuardrailViolation sets the node’s error field and the workflow summary.status to "failed":

{
  "id": "answer",
  "error": "GuardrailViolation: Injection pattern detected in input: 'ignore\\s+(all\\s+)?...'",
  "guardrails_passed": []
}

​Overview

​Built-in Guardrails

​injection

​length

​pii

​schema

​cost_cap

​Configuration

​Workflow-level (default for all agents)

​Per-agent override

​Disabling all guardrails

​Execution Trace

Overview

Built-in Guardrails

`injection`

`length`

`pii`

`schema`

`cost_cap`

Configuration

Workflow-level (default for all agents)

Per-agent override

Disabling all guardrails

Execution Trace