PolicyBench paper
Benchmarking no-tool tax-and-benefit estimation in frontier language models. This page embeds the 2026-05-20 scored manuscript snapshot: a 100-household-per-country public preview using household-equal impact scores against PolicyEngine reference outputs.
Snapshot 2026-05-20
Manuscript
Frozen manuscript snapshot