Leak rate
The share of tested prompts that reached the provider when they should have been blocked in that run context.
Verifiable evidence surface
Evidence surface
Published runs are used to inspect how a deterministic gate behaves under test. They are not a model leaderboard and they are not an intelligence ranking surface.
| Date (UTC) | Model | Pack | Prompts | Baseline leaks | Gated leaks | Blocked benign | Outcome | Evidence |
|---|---|---|---|---|---|---|---|---|
| Loading paired benchmark runs... | ||||||||
Reading the results
The share of tested prompts that reached the provider when they should have been blocked in that run context.
The count of prompts treated as harmless in the run summary but blocked by the gate.
A hash binding for the evaluated suite so the reviewed material can be tied back to a specific test input set.
Run, audit, and artefact links let reviewers inspect the published evidence directly rather than relying on a summary claim.