Proofs, Not Statistics — Formal Verification for AI Guardrails
Every AI security product on the market ships with the same sentence: "tested against 10,000 attacks." That is statistics over a sample. Today raucle-detect v0.9.0 ships the alternative — actual SMT-backed proofs that no tool-call, URL, or SQL query a bounded agent can emit will violate its policy.
The number "10,000" is doing an enormous amount of work in the average AI-security pitch deck. It is the entire substance of the safety claim: a vendor ran their guardrail against some test set, the guardrail blocked most of it, and from that the buyer is asked to infer that the next attack — the one nobody has thought of yet — will also be blocked. Software security spent the 1990s and 2000s outgrowing exactly this argument. Memory-safety bugs weren't eliminated by testing harder; they were eliminated by writing code in languages where the bug class is structurally impossible to express. We can borrow that move.
The trick is to narrow the question. You cannot prove anything useful about the space of all English strings, because the grammar is unbounded and the semantics are open. But agentic AI doesn't ultimately act in English. It acts through machine-readable interfaces: a JSON tool call, an HTTP request, a SQL query. Those grammars are bounded. The set of strings an agent can emit through them is enumerable. Over those, we can prove things.
Three first-cut provers
v0.9.0 ships three. Each takes a grammar and a policy and returns one of three outcomes: PROVEN, REFUTED with a concrete counterexample, or UNDECIDED when the solver times out.
1. JSONSchemaProver
The big one. Every well-engineered agent in 2026 declares its tool interface as JSON Schema. Take the schema, take a policy, ask Z3 whether a counterexample exists. If unsat, the policy is proven complete over the schema.
from raucle_detect.prove import JSONSchemaProver
schema = {
"type": "object",
"properties": {
"to": {"type": "string"},
"amount": {"type": "number", "minimum": 0, "maximum": 1_000_000},
"currency": {"type": "string", "enum": ["USD", "EUR", "GBP"]},
},
"required": ["to", "amount", "currency"],
}
policy = {
"max_value": {"amount": 100},
"forbidden_values": {"to": ["attacker@evil.example"]},
}
result = JSONSchemaProver().prove(schema, policy)
# REFUTED — Z3 produces a concrete counterexample:
# {"to": "alice@example.com", "amount": 1000000, "currency": "USD"}
That's not "we tested 10,000 transfers and none exceeded $100". That's "the SMT solver found an exact assignment that satisfies the schema and violates the policy". The schema permits transfers up to a million; the policy caps at a hundred; the gap is real and now you have a witness for it.
Tighten the schema, rerun, get PROVEN:
The hash is content-addressed over (status, prover, prover_version, grammar_hash, policy_hash, counterexample, notes). It drops straight into the v0.5.0 verdict receipt and v0.4.0 audit chain. Proof artifacts become first-class citizens of the trust graph — you can attach a policy-proof: hash to every receipt that touches this tool and answer, three days later, "on what date was this policy formally complete over this interface?"
2. URLPolicyProver
For HTTP-tool agents. Declare your grammar (which schemes, which hosts, which path prefixes, which query keys the agent can use) and the policy (require_https, forbid query keys, host allowlist with wildcards, max path depth). Counterexamples are concrete URLs.
grammar = {
"schemes": ["https"],
"hosts": ["api.example.com", "evil.attacker.com"],
"path_prefixes": ["/v1/"],
}
policy = {"host_allowlist": ["*.example.com"]}
URLPolicyProver().prove(grammar, policy)
# REFUTED — counterexample: {"scheme": "https", "host": "evil.attacker.com", ...}
The counterexample is a sharp pointer. It tells you exactly which host the grammar allows that the policy doesn't. Fix the grammar (remove the host) or fix the policy (broaden the allowlist) — the proof is the spec for which one is wrong.
3. SQLClauseProver
Modelling arbitrary SQL in SMT is outside the scope of any honest one-month project, so we don't. Instead: the agent's SQL has to come from a finite set of statement templates. Prove the policy over each template. Forbidden tokens (DROP, DELETE, TRUNCATE), statement chaining (;), allowed tables. Counterexamples include the template plus the rule that broke.
grammar = {
"templates": [
"SELECT id, name FROM customers WHERE tenant_id = ?",
"SELECT * FROM secrets",
],
"allowed_tables": ["customers", "invoices"],
}
policy = {"allowed_tables": ["customers", "invoices"]}
SQLClauseProver().prove(grammar, policy)
# REFUTED — counterexample: {
# "template": "SELECT * FROM secrets",
# "violation": "table 'SECRETS' not in allowed_tables"
# }
Honest scope
The thing this release deliberately does not do is the thing other tools brag about: it doesn't claim to verify your whole agent. It verifies the bounded interface. The honest claim is "no string in this declared grammar violates this declared policy" — not "your agent is safe in general". When the grammar is the tool-call schema, that turns out to be a very useful claim.
Recursive schemas, arbitrary regex constraints on string fields, full open-grammar SQL — these raise UnsupportedGrammar and refuse to run. Pretending to prove things you can't is exactly the failure mode of the existing market. Refusing is the better default.
How this composes
The whole point of building this on top of the v0.4.0 / v0.5.0 / v0.6.0 / v0.8.0 stack is that proofs cite:
- A
ProofResult.hashbinds to the v0.5.0 verdict receipt for any scan against that interface. The receipt now answers not just "was this input clean?" but "was the policy structurally complete over the interface this input came through?" - The audit chain (v0.4.0) records when each proof was issued, so a six-months-later forensic answers "did this agent have a current proof at the time of the incident?"
- The IOC feed (v0.8.0) and the proof layer (v0.9.0) are complementary: feeds catch known badness fast; proofs guarantee structural impossibility for declared interfaces. Both signals flow into the same receipt.
- Counterfactual replay (v0.6.0) can now ask "if we'd had today's proof at last week's incident time, what would have changed?"
Where this goes next
The first cut is intentionally narrow. Three obvious extensions are queued:
- String-regex literals as bounded grammars. A policy of the form "this string field must match
^[a-z0-9]{1,32}$" can be encoded in Z3's theory of strings and proven against an over-permissive schema. - JSON Schema
oneOf/anyOf. Currently disjunctive schemas raiseUnsupportedGrammar. The encoding exists; we just haven't shipped it yet. - Signed proof artifacts as standalone files. A
.raucle-proof.jsonfile you can publish alongside an agent's tool manifest, the same way you publish a SLSA attestation alongside a build. Buyers verify offline.
The arc
Every release this year has been a piece of one argument: trust in AI infrastructure must be cryptographic or structural, not promised. Receipts gave us per-step attestation. Audit gave us tamper-evident history. Counterfactual replay gave us verifiable hindsight. Multimodal gave us input-surface coverage. The IOC feed gave us social-layer compounding. Today's release adds the deepest property on the list: provable structural impossibility, for the parts of the agent's behaviour where the question is well-formed.
The market's "10,000 attacks tested" line is going to age the way "SSL is good enough" aged.
Discussion: Hacker News · Lobste.rs · /r/MachineLearning · GitHub Issues
Raucle is an open-source AI security project. The runtime detection engine, the provenance receipt format, the input store, the multimodal scanner, the federated feed protocol, the formal-verification provers, and all reference implementations are MIT-licensed.