Back to blog

February 4, 2026

OG-OpenClawGuard v1: Protecting AI Agents from Hidden Prompt Injection in Long Content

OpenGuardrails Team ยท Product Announcementsยท5 min readยทProduct, Security, Open Source

AI agents have evolved beyond simple chat assistants.

Modern agent runtimes like OpenClaw (ClawdBot) can:

  • Read and summarize web pages
  • Process incoming emails
  • Analyze uploaded documents
  • Execute multi-step workflows autonomously

This power opens a critical attack surface: indirect prompt injection.

The Threat: Hidden Instructions in Untrusted Content

Indirect prompt injection occurs when attackers embed malicious instructions inside content that an AI agent will process. Unlike direct attacks where users type malicious prompts, these attacks hide in:

  • Emails that appear to be normal business correspondence
  • Web pages with hidden text or instructions
  • Documents with invisible or disguised commands
  • Code comments or metadata fields

When an AI agent reads this content, it may follow the hidden instructions โ€” overriding its original task, leaking sensitive data, or executing dangerous commands.

โ€œA single malicious email could instruct your AI to 'ignore previous guidelines' and exfiltrate your SSH keys.โ€

This is not a theoretical risk. Research has demonstrated successful injection attacks across major AI systems. As agents gain more capabilities, the consequences become more severe.

Introducing OG-OpenClawGuard

Today, we release OG-OpenClawGuard v1 โ€” an open-source plugin for OpenClaw that provides real-time protection against indirect prompt injection attacks.

OG-OpenClawGuard is powered by OpenGuardrails' state-of-the-art detection model, which achieves:

  • 87.1% F1 on English prompt injection detection
  • 97.3% F1 on multilingual prompt detection
  • 274.6ms P95 latency for production-grade performance
  • 119 languages supported

How It Works

OG-OpenClawGuard hooks into OpenClaw's tool result pipeline. When your agent reads a file, fetches a webpage, or receives any external content, OG-OpenClawGuard:

  • Chunks long content โ€” Splits content into 4000-character chunks with 200-character overlap for thorough analysis
  • Analyzes each chunk โ€” Sends each chunk to the OG-Text model with focused detection prompts
  • Aggregates findings โ€” Combines results across all chunks to determine if injection is present
  • Blocks or warns โ€” Prevents compromised content from reaching the agent, with detailed explanations

All analysis happens automatically and transparently โ€” your agent continues working while OG-OpenClawGuard silently guards every piece of external content it processes.

Built-in Commands for Visibility

OG-OpenClawGuard includes slash commands for monitoring and feedback:

  • /og_status โ€” View detection statistics and configuration
  • /og_report โ€” See recent injection detections with details
  • /og_feedback โ€” Report false positives or missed detections

User feedback is stored locally and helps improve detection quality over time. We believe security tools should be transparent and accountable.

Quick Start

Getting started takes seconds:

openclaw plugins install og-openclawguard
openclaw gateway restart

That's it. Your OpenClaw agent is now protected against prompt injection in emails, documents, and web content.

Open Source, Continuously Improving

OG-OpenClawGuard is fully open source under the MIT license.

We are committed to continuous improvement. As new attack vectors emerge, we will update detection capabilities. As users provide feedback, we will refine accuracy. Our goal is to make indirect prompt injection a solved problem for personal AI agents.

Part of the OpenGuardrails Ecosystem

OG-OpenClawGuard joins OG Personal as part of our growing ecosystem of open-source AI security tools. While OG Personal provides comprehensive observability and threat detection across all agent activities, OG-OpenClawGuard focuses specifically on the critical problem of content-based injection attacks.

Together, they represent our vision: AI security that's visible, controllable, and accessible to everyone.

Questions or feedback? Reach out to thomas@openguardrails.com or open an issue on GitHub.