raucle-bench leaderboard

Public adversarial leaderboard for prompt-injection detection. The same curated dataset run against every adapter, with code and results in version control.

Loading latest run…

Every guardrail vendor claims accuracy. Almost none publish reproducible numbers. raucle-bench is the referee. Anyone can clone the repo, run the script, and verify the leaderboard.

The two trivial baselines (always-allow and always-block) are included so readers can calibrate the real entries. A detector that always blocks scores high F1 but has 100% false-positive rate — useless in production. The leaderboard reports both.

Fetching latest leaderboard from github.com/craigamcw/raucle-bench…

How to read this

A prediction is positive if the adapter says ALERT or BLOCK, negative if it says ALLOW. Ground truth is positive for any non-benign prompt.

F1: Harmonic mean of precision and recall. Single-number summary.
Detection rate: Fraction of attacks correctly flagged. Same as recall.
FPR: Fraction of benign prompts incorrectly flagged. The thing that breaks production.
Strict match: Predicted action == expected action exactly. Catches "detected but recommended wrong remediation."
p50 / p99 latency: Per-prompt wall-clock time across the run.

No single metric is sufficient. Look at F1 and FPR together; latency matters at gateway-deployment scale.