Superhuman Automated Forecasting
AI forecasting tools approaching crowd-level accuracy signal a near-term capability APS policy and risk teams may encounter in vendor proposals or decision-support contexts.
Key points
- Centre for AI Safety's FiveThirtyNine bot matches crowd-level forecasting accuracy on 177 Metaculus questions.
- The tool is pitched as a policy decision-support aid, helping policymakers reduce bias and assess uncertain risks.
- Limited APS-specific relevance; useful as context on AI-assisted decision-making capabilities, not an actionable guidance item.
Summary
The Centre for AI Safety has published FiveThirtyNine, a GPT-4o-based forecasting bot that matches the accuracy of crowd forecasters on a 177-question Metaculus evaluation set, with 87.7% accuracy versus the crowd's 87.0%. The bot uses structured web search, reason-weighing, and bias-adjusted probability outputs to respond to arbitrary queries. CAIS positions the tool as a potential aid for policymakers and public discourse, citing advantages in speed and cost over prediction markets. Known limitations include automation bias risk, no fine-tuning, poor performance on very recent events, and no reject option for invalid queries.
Implications for Australian agencies
- Monitor APS policy and risk teams may want to monitor the maturation of AI forecasting tools as they are increasingly pitched to government as decision-support or horizon-scanning aids.
- Consider Agencies exploring AI-assisted decision-making could consider automation bias risks flagged in this release when evaluating any probabilistic AI outputs used in policy settings.
Implications are AI-generated. Starting points, not advice.
"Superhuman Automated Forecasting" Source: Centre for AI Safety – Blog Published: (undated) URL: https://safe.ai/blog/forecasting The Centre for AI Safety has published FiveThirtyNine, a GPT-4o-based forecasting bot that matches the accuracy of crowd forecasters on a 177-question Metaculus evaluation set, with 87.7% accuracy versus the crowd's 87.0%. The bot uses structured web search, reason-weighing, and bias-adjusted probability outputs to respond to arbitrary queries. CAIS positions the tool as a potential aid for policymakers and public discourse, citing advantages in speed and cost over prediction markets. Known limitations include automation bias risk, no fine-tuning, poor performance on very recent events, and no reject option for invalid queries. Implications for Australian agencies: - [Monitor] APS policy and risk teams may want to monitor the maturation of AI forecasting tools as they are increasingly pitched to government as decision-support or horizon-scanning aids. - [Consider] Agencies exploring AI-assisted decision-making could consider automation bias risks flagged in this release when evaluating any probabilistic AI outputs used in policy settings. Retrieved from SIMS, 18 May 2026.