Skip to main content

LLM Guardrails and Safety Patterns

· 2 min read
Sanjeev Sarda
High Performance Developer

Notes on AI/LLM guardrails and safety patterns from a book on "Agentic Design Patterns" by one of Google's Distinguished Engineers, Antonio Gulli.

walking

Guardrails and Safety Patterns

Provide a protective layer to guide agent behavior in order to prevent "harmful, biased, irrelevant or otherwise undesirable responses".

Other reasons for guardrails:

  • Legal
  • Compliance

Guardrails can be implemented using "lower power" but faster LLM models. They should be teamed up with observability to understand when they're being triggered, false positives and general user behavior.

Where to Apply Guardrails

  • Input validation - filter malicious content and prevent jailbreak attacks.

  • Prompt or behavioural constraints - directly instructing the LLM, explicitly preventing or allowing tool use.

  • External moderation APIs, "human in the loop" or other LLMs options for output validation.

Guardrail Prompts

  • A general purpose safety prompt - company policy
  • Permissible input prompt
  • A structured output definition prompt
  • Policy determination by a prompt (input or output validation, what policy does it break?)
  • Technical guard rail prompt to verify the output of other prompts
  • Jailbreak prompt

https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0#heading=h.pxcur8v2qagu - "Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems" by Antonio Gulli