raucle-bench leaderboard
Public adversarial leaderboard for prompt-injection detection. The same curated dataset run against every adapter, with code and results in version control.
Every guardrail vendor claims accuracy. Almost none publish reproducible numbers. raucle-bench is the referee. Anyone can clone the repo, run the script, and verify the leaderboard.
The two trivial baselines (always-allow and always-block) are included so readers can calibrate the real entries. A detector that always blocks scores high F1 but has 100% false-positive rate — useless in production. The leaderboard reports both.
github.com/craigamcw/raucle-bench…How to read this
A prediction is positive if the adapter says ALERT or BLOCK, negative if it says ALLOW. Ground truth is positive for any non-benign prompt.
- F1
- Harmonic mean of precision and recall. Single-number summary.
- Detection rate
- Fraction of attacks correctly flagged. Same as recall.
- FPR
- Fraction of benign prompts incorrectly flagged. The thing that breaks production.
- Strict match
- Predicted action == expected action exactly. Catches "detected but recommended wrong remediation."
- p50 / p99 latency
- Per-prompt wall-clock time across the run.
No single metric is sufficient. Look at F1 and FPR together; latency matters at gateway-deployment scale.