Superhuman Automated Forecasting

9 May 2026 · Centre for AI Safety – Blog Global

AI forecasting tools approaching crowd-level accuracy signal a near-term capability APS policy and risk teams may encounter in vendor proposals or decision-support contexts.

Key points

Centre for AI Safety's FiveThirtyNine bot matches crowd-level forecasting accuracy on 177 Metaculus questions.
The tool is pitched as a policy decision-support aid, helping policymakers reduce bias and assess uncertain risks.
Limited APS-specific relevance; useful as context on AI-assisted decision-making capabilities, not an actionable guidance item.

Summary

The Centre for AI Safety has published FiveThirtyNine, a GPT-4o-based forecasting bot that matches the accuracy of crowd forecasters on a 177-question Metaculus evaluation set, with 87.7% accuracy versus the crowd's 87.0%. The bot uses structured web search, reason-weighing, and bias-adjusted probability outputs to respond to arbitrary queries. CAIS positions the tool as a potential aid for policymakers and public discourse, citing advantages in speed and cost over prediction markets. Known limitations include automation bias risk, no fine-tuning, poor performance on very recent events, and no reject option for invalid queries.

Implications for Australian agencies

Monitor APS policy and risk teams may want to monitor the maturation of AI forecasting tools as they are increasingly pitched to government as decision-support or horizon-scanning aids.
Consider Agencies exploring AI-assisted decision-making could consider automation bias risks flagged in this release when evaluating any probabilistic AI outputs used in policy settings.

Implications are AI-generated. Starting points, not advice.