Guardrails

Guardrails

Overview

The Guardrails node validates content using AI-powered checks to ensure safety, accuracy, and compliance. Each guardrail uses an LLM as a judge to evaluate your input against specific criteria, failing the workflow if confidence thresholds are exceeded.

Best for: Content moderation, PII detection, hallucination checks, jailbreak prevention, and custom validation rules.

How It Works

1

Provide input content to validate

Input comes from previous nodes in the workflow.

2

Enable specific guardrail checks

Select which guardrails to run (PII, Moderation, Jailbreak, Hallucination, Custom, etc.).

3

Set confidence threshold for each check

Thresholds range from 0–1 and determine how strict each check is.

4

Choose an AI model for evaluation

More capable models provide more accurate detection but cost more.

5

Evaluate results

If any check exceeds its threshold → the Guardrails node fails and flags the issue.

Configuration

Input

The content you want to validate. Supports Manual, Auto, and Prompt AI modes.

Example usage:

Model Selection

Choose the AI model used to evaluate all enabled guardrails. More capable models provide more accurate detection but may cost more.

Available Guardrails

Personally Identifiable Information (PII)

Detects personal information like names, emails, phone numbers, addresses, SSNs, credit cards, etc.

When to use:

  • Before storing user-generated content

  • When sharing data externally

  • Compliance requirements (GDPR, HIPAA)

  • Customer service workflows

Configuration:

  • Confidence Threshold: 0.7 (recommended)

  • Higher threshold = stricter detection

Example:


Moderation

Checks for inappropriate, harmful, or offensive content including hate speech, violence, adult content, harassment, etc.

When to use:

  • User-generated content platforms

  • Public-facing communications

  • Community moderation

  • Customer-facing outputs

Configuration:

  • Confidence Threshold: 0.6 (recommended)

  • Adjust based on your content policies


Jailbreak Detection

Identifies attempts to bypass AI safety controls or manipulate the AI into unintended behaviors.

When to use:

  • Processing user prompts before sending to AI

  • Public AI interfaces

  • Workflows with user-provided instructions

  • Security-sensitive applications

Configuration:

  • Confidence Threshold: 0.7 (recommended)

  • Higher threshold for fewer false positives

Example:


Hallucination Detection

Detects when AI-generated content contains false or unverifiable information.

When to use:

  • Fact-based content generation

  • Customer support responses

  • Financial or medical information

  • Any workflow where accuracy is critical

Configuration:

  • Confidence Threshold: 0.6 (recommended)

  • Requires reference data for comparison

Example:


Custom Evaluation

Define your own validation criteria using natural language instructions.

When to use:

  • Domain-specific validation

  • Brand voice compliance

  • Custom business rules

  • Specialized content requirements

Configuration:

  • Evaluation Criteria: Describe what to check for

  • Confidence Threshold: Set based on strictness needed

Example:

Setting Confidence Thresholds

The confidence threshold determines how strict each check is:

Threshold
Behavior
Use When

0.3–0.5

Lenient

Avoid false positives, informational only

0.6–0.7

Balanced

Most use cases, good accuracy

0.8–0.9

Strict

High-risk scenarios, critical validation

0.9–1.0

Very Strict

Only flag very obvious violations

Start with 0.7 as a balanced default, then adjust based on false positives or missed detections.

Example Workflows

Content Moderation Pipeline

AI Response Validation

Multi-Check Validation

Handling Failures

When a guardrail check fails, the workflow stops at the Guardrails node. Configure error handling to route to alternative paths, send notifications, or trigger fallback actions (manual review queues, logging, alerts, retries, etc.).

When to Use Each Guardrail

PII Detection — Use for:

  • Public content that shouldn’t contain personal information

  • Data being sent to third parties or external systems

  • Compliance-sensitive workflows (GDPR, HIPAA, etc.)

  • Preventing accidental exposure of sensitive user data

Moderation — Use for:

  • User-generated content that needs review

  • Public-facing outputs and communications

  • Community platforms and forums

  • Filtering inappropriate or harmful content

Jailbreak Detection — Use for:

  • User-provided prompts or instructions to AI

  • Public AI interfaces accessible to external users

  • Security-critical applications where prompt manipulation is a risk

  • Protecting against attempts to bypass system constraints

Hallucination Detection — Use for:

  • Fact-based content generation requiring accuracy

  • Customer support responses with specific information

  • Financial or medical information where accuracy is critical

  • Any content where false information could cause harm

Custom Evaluation — Use for:

  • Brand compliance and tone of voice guidelines

  • Domain-specific rules and industry standards

  • Quality standards unique to your organization

  • Business-specific requirements not covered by other guardrails

Best Practices

  • Enable Multiple Checks: Combine guardrails (e.g., PII + Moderation) for comprehensive validation.

  • Start with Balanced Thresholds: Begin with 0.7 and adjust based on results.

  • Always Handle Failures: Add error paths to notify teams, log violations, or trigger alternative actions.

  • Test with Edge Cases: Calibrate thresholds using borderline content.

  • Use Appropriate Models: More capable models (e.g., GPT-4) provide better detection but cost more.

  • Document Custom Evaluations: Write clear, specific criteria for custom evaluations.

Next Steps