AI agents have evolved beyond simple chat assistants.
Modern agent runtimes like OpenClaw (ClawdBot) can:
- Read and summarize web pages
- Process incoming emails
- Analyze uploaded documents
- Execute multi-step workflows autonomously
This power opens a critical attack surface: indirect prompt injection.
The Threat: Hidden Instructions in Untrusted Content
Indirect prompt injection occurs when attackers embed malicious instructions inside content that an AI agent will process. Unlike direct attacks where users type malicious prompts, these attacks hide in:
- Emails that appear to be normal business correspondence
- Web pages with hidden text or instructions
- Documents with invisible or disguised commands
- Code comments or metadata fields
When an AI agent reads this content, it may follow the hidden instructions โ overriding its original task, leaking sensitive data, or executing dangerous commands.
โA single malicious email could instruct your AI to 'ignore previous guidelines' and exfiltrate your SSH keys.โ
This is not a theoretical risk. Research has demonstrated successful injection attacks across major AI systems. As agents gain more capabilities, the consequences become more severe.
Introducing OG-OpenClawGuard
Today, we release OG-OpenClawGuard v1 โ an open-source plugin for OpenClaw that provides real-time protection against indirect prompt injection attacks.
OG-OpenClawGuard is powered by OpenGuardrails' state-of-the-art detection model, which achieves:
- 87.1% F1 on English prompt injection detection
- 97.3% F1 on multilingual prompt detection
- 274.6ms P95 latency for production-grade performance
- 119 languages supported
How It Works
OG-OpenClawGuard hooks into OpenClaw's tool result pipeline. When your agent reads a file, fetches a webpage, or receives any external content, OG-OpenClawGuard:
- Chunks long content โ Splits content into 4000-character chunks with 200-character overlap for thorough analysis
- Analyzes each chunk โ Sends each chunk to the OG-Text model with focused detection prompts
- Aggregates findings โ Combines results across all chunks to determine if injection is present
- Blocks or warns โ Prevents compromised content from reaching the agent, with detailed explanations
All analysis happens automatically and transparently โ your agent continues working while OG-OpenClawGuard silently guards every piece of external content it processes.
Built-in Commands for Visibility
OG-OpenClawGuard includes slash commands for monitoring and feedback:
- /og_status โ View detection statistics and configuration
- /og_report โ See recent injection detections with details
- /og_feedback โ Report false positives or missed detections
User feedback is stored locally and helps improve detection quality over time. We believe security tools should be transparent and accountable.
Quick Start
Getting started takes seconds:
openclaw plugins install og-openclawguardopenclaw gateway restart
That's it. Your OpenClaw agent is now protected against prompt injection in emails, documents, and web content.
Open Source, Continuously Improving
OG-OpenClawGuard is fully open source under the MIT license.
We are committed to continuous improvement. As new attack vectors emerge, we will update detection capabilities. As users provide feedback, we will refine accuracy. Our goal is to make indirect prompt injection a solved problem for personal AI agents.
Part of the OpenGuardrails Ecosystem
OG-OpenClawGuard joins OG Personal as part of our growing ecosystem of open-source AI security tools. While OG Personal provides comprehensive observability and threat detection across all agent activities, OG-OpenClawGuard focuses specifically on the critical problem of content-based injection attacks.
Together, they represent our vision: AI security that's visible, controllable, and accessible to everyone.
Questions or feedback? Reach out to thomas@openguardrails.com or open an issue on GitHub.