The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

9 May 2026 · Centre for AI Safety – Blog Global

Provides a concrete technical tool for assessing and reducing LLM biosecurity and cybersecurity risks - relevant to Australian AISI and agency AI risk frameworks.

Key points

Summary

The Centre for AI Safety, with Scale AI and over twenty academic and industry partners, has released the Weapons of Mass Destruction Proxy (WMDP) benchmark - a dataset of 4,157 multiple-choice questions designed to measure hazardous knowledge in LLMs across biosecurity, cybersecurity, and chemical security domains. Alongside the benchmark, they introduce 'CUT', an unlearning method that removes hazardous knowledge from models entirely rather than suppressing it, making jailbreak attacks ineffective. The benchmark is designed to avoid including directly hazardous information, focusing on proxy knowledge that correlates with dangerous capabilities. The work is positioned to inform AI developers, policymakers, and safety researchers on reducing malicious use risks.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.