> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sirenspec.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Guardrails

> Guardrails protect every agent call with input validation and output filtering.

## Overview

Guardrails run on every node execution — checking input before it reaches the LLM and validating (or transforming) output before it is written to the context.

By default, the `injection` guardrail is active on all agents. You can configure guardrails at the workflow level, per agent, or disable them entirely.

```yaml theme={null}
guardrails:        # workflow-level (applies to all agents by default)
  - injection
  - length
```

***

## Built-in Guardrails

### `injection`

Detects common prompt-injection patterns in both input and output text.

If an injection signature is detected, the node fails immediately with a `GuardrailViolation` and the workflow status is set to `"failed"`.

**Detected patterns include:**

* `ignore previous instructions`
* `disregard your instructions`
* `you are now [role]`
* `forget your instructions`
* `new instructions:`
* `override previous instructions`
* `act as a [role]`
* `pretend you are [role]`
* `your new role is`
* `system: you are`

Detection is case-insensitive.

**Default:** Always active unless explicitly overridden with an empty list or a list that omits `injection`.

### `length`

Limits the length of LLM output. In the default `"truncate"` mode, responses longer than the limit are silently cut and appended with `"..."`.

| Parameter   | Default    | Description                                                                      |
| ----------- | ---------- | -------------------------------------------------------------------------------- |
| `max_chars` | `4000`     | Maximum allowed output length in characters.                                     |
| `mode`      | `truncate` | `"truncate"` appends `"..."` and trims; `"raise"` raises a `GuardrailViolation`. |

<Note>
  The `length` guardrail only checks output — input is passed through unchanged.
</Note>

### `pii`

Detects personally identifiable information in both input and output text and either redacts it, blocks the call, or passes it through with a flag. Supported entities are `email`, `phone`, `ssn`, and `credit_card`. Credit-card matches are filtered through the Luhn checksum to suppress false positives.

**Configuration:**

| Parameter     | Type            | Default           | Description                                                                                                                 |
| ------------- | --------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------- |
| `entities`    | list of strings | all four entities | Subset of `["email", "phone", "ssn", "credit_card"]` to detect.                                                             |
| `action`      | string          | `"redact"`        | `"redact"` replaces each match with `replacement`; `"block"` raises `PIIDetectedError`; `"flag"` leaves the text unchanged. |
| `replacement` | string          | `"[REDACTED]"`    | Replacement string used when `action: "redact"`.                                                                            |

**Behavior:**

* **Input:** Inspected before the prompt is sent to the LLM.
* **Output:** Inspected after the response is received, before downstream nodes see it.
* **On `action: "block"`:** Raises `PIIDetectedError` (a `GuardrailError` subclass) listing the entity types that matched.

**Example — redact emails and phones in both directions:**

```yaml theme={null}
guardrails:
  - name: pii
    config:
      entities: ["email", "phone"]
      action: redact
      replacement: "[REDACTED]"
```

**Example — block any credit-card leakage from a finance agent:**

```yaml theme={null}
agents:
  finance:
    model: "openai:gpt-4o-mini"
    system: "Answer finance questions without ever quoting card numbers."
    guardrails:
      - name: pii
        config:
          entities: ["credit_card"]
          action: block
```

### `schema`

Validates LLM output against a JSON Schema Draft 7 definition. The guardrail parses the output text as JSON and checks it against the provided schema. Input text is passed through unchanged.

Use this guardrail when you need the LLM to produce structured JSON output that conforms to a specific schema — for example, extracting data into a fixed set of fields.

**Configuration:**

The `schema` guardrail requires a `name` and a `config` dict with a `schema` key:

```yaml theme={null}
guardrails:
  - name: schema
    config:
      schema:
        type: "object"
        properties:
          name:
            type: "string"
          age:
            type: "integer"
            minimum: 0
        required: ["name", "age"]
```

**Behavior:**

* **Input:** Passed through unchanged.
* **Output:** Parsed as JSON and validated against the schema.
* **On failure:** Raises a `GuardrailViolation` with a message indicating the path and constraint that failed (e.g., `Schema violation at "$.age": 25 is greater than the maximum of 20`).

<Tip>
  Pair the `schema` guardrail with `retry.retry_on_guardrail: true` so a `GuardrailViolation` re-runs the LLM call instead of failing the run — giving the model another chance to emit conforming JSON. See [Retry Policies](/retry-policies#retrying-on-guardrail-violations).
</Tip>

**Example:**

```yaml theme={null}
agents:
  extractor:
    model: "openai:gpt-4o-mini"
    system: "Extract the person's name and age as JSON: {\"name\": \"...\", \"age\": ...}"
    guardrails:
      - name: schema
        config:
          schema:
            type: "object"
            properties:
              name:
                type: "string"
              age:
                type: "integer"
                minimum: 0
                maximum: 150
            required: ["name", "age"]
```

### `cost_cap`

Enforces token and/or USD budget ceilings on a workflow. After each node executes, the guardrail checks cumulative token usage and estimated cost. On exceedance, it either aborts the workflow or logs a warning.

At least one of `max_usd` or `max_tokens` must be specified.

| Parameter    | Type   | Default   | Description                                                                              |
| ------------ | ------ | --------- | ---------------------------------------------------------------------------------------- |
| `max_usd`    | float  | None      | Maximum estimated USD spend for the entire run.                                          |
| `max_tokens` | int    | None      | Maximum token ceiling independent of cost.                                               |
| `action`     | string | `"abort"` | `"abort"` raises `BudgetExceededError` and stops execution; `"warn"` logs and continues. |

**Behavior:**

* **Input/Output:** Both are pass-throughs; the guardrail only examines budget state.
* **Budget checking:** Runs after each node completes. Available to `when:` conditions as `_budget.total_tokens` and `_budget.estimated_usd`.
* **On `action: "abort"`:** Raises `BudgetExceededError` with details of the violation; the workflow transitions to `"failed"` status.
* **On `action: "warn"`:** Logs a structured warning and allows execution to continue.

**Example:**

```yaml theme={null}
guardrails:
  - name: cost_cap
    config:
      max_usd: 5.0
      max_tokens: 10000
      action: abort
```

Use in a conditional edge to skip expensive nodes once a budget threshold is reached:

```yaml theme={null}
edges:
  - from: classify
    to: expensive_analysis
    when: _budget.estimated_usd < 3.0  # only run if we have budget left

  - from: expensive_analysis
    to: end
    when: true

  - from: classify
    to: cheap_fallback
    when: _budget.estimated_usd >= 3.0  # skip expensive path when budget low

  - from: cheap_fallback
    to: end
    when: true
```

<Note>
  The `cost_cap` guardrail works best when pricing information is available for your LLM models. If no pricing is found, `estimated_usd` will be `None` and only the `max_tokens` ceiling will be enforced.
</Note>

***

## Configuration

### Workflow-level (default for all agents)

```yaml theme={null}
guardrails:
  - injection
  - length
```

If `guardrails` is omitted from the workflow file, only `injection` is active.

### Per-agent override

An agent's `guardrails` field completely replaces the workflow-level list for that agent:

```yaml theme={null}
guardrails:
  - injection
  - length

agents:
  summarizer:
    model: "openai:gpt-4o-mini"
    system: "Summarise the text."
    guardrails: ["length"]   # injection disabled for this agent only

  responder:
    model: "openai:gpt-4o-mini"
    system: "You are a support agent."
    # no override — inherits [injection, length] from workflow level
```

### Disabling all guardrails

Set an empty list to disable all guardrails for the workflow or a specific agent:

```yaml theme={null}
# Disable for the entire workflow
guardrails: []

# Disable for one agent
agents:
  internal_tool:
    model: "openai:gpt-4o-mini"
    system: "Internal tool with no user-facing output."
    guardrails: []
```

<Warning>
  Disabling the `injection` guardrail removes protection against prompt-injection attacks. Only do this for agents that process fully trusted input.
</Warning>

***

## Execution Trace

Guardrails that pass are recorded in each node's trace entry:

```json theme={null}
{
  "id": "answer",
  "guardrails_passed": [
    "InjectionGuardrail.check_input",
    "InjectionGuardrail.check_output",
    "LengthGuardrail.check_output"
  ],
  "error": null
}
```

A `GuardrailViolation` sets the node's `error` field and the workflow `summary.status` to `"failed"`:

```json theme={null}
{
  "id": "answer",
  "error": "GuardrailViolation: Injection pattern detected in input: 'ignore\\s+(all\\s+)?...'",
  "guardrails_passed": []
}
```
