Submit Your Toughest Questions for Humanity's Last Exam

Centre for AI Safety – Blog(Global) 9 May 2026 38

Benchmark saturation is a real measurement challenge for AI governance - understanding capability ceilings matters for risk assessment, but this item is dated.

Key points

CAIS and Scale AI are crowdsourcing expert-level questions to build a frontier AI capability benchmark called Humanity's Last Exam.
The project addresses benchmark saturation - top AI models now near-ceiling existing tests like MMLU.
This item is a call for submissions with a November 2024 deadline - likely already closed, limiting immediate relevance.

Implications for Australian agencies

Monitor APS analysts tracking AI capability assessment may want to monitor the published results of Humanity's Last Exam as a signal of where frontier models currently sit relative to expert-level performance.

Implications are AI-generated. Starting points, not advice — see methodology for how they're framed.

View original source

Appeared in: Weekly digest, 4 May 2026