Skip to main content

PolicyBench paper

Benchmarking no-tool tax-and-benefit estimation in frontier language models. This page embeds the 2026-05-20 scored manuscript snapshot: a 100-household-per-country public preview using household-equal impact scores against PolicyEngine reference outputs.

Snapshot 2026-05-20
Manuscript
PolicyEngineResearch paper by PolicyEngine
Frozen manuscript snapshot