Paper
PolicyBench paper
This is the landing page for the citable PolicyBench preprint. The manuscript is maintained in a single Quarto source tree and published in both PDF and web formats.
Embedded web manuscript
Paper sections
- Introduction and motivation
- Benchmark design and scoring
- US and UK data construction
- Headline results
- Failure modes
- Limitations and next steps