Model Evaluation for Extreme Risks

6 Feb 2026 · MIT AI Risk Repository – Blog Global

Dangerous-capability taxonomies like this one inform how safety institutes—including Australia's AISI—design pre-deployment evaluation criteria.

Key points

Summary

MIT's AI Risk Repository spotlights a 2023 paper by Shevlane, Farquhar, Garfinkel and co-authors proposing that model evaluation can address extreme AI risks by assessing both dangerous capabilities and model alignment. The framework identifies nine capability categories—including cyber-offense, deception, persuasion, weapons acquisition, and self-proliferation—through which general-purpose AI systems could cause catastrophic harm. The paper outlines how such evaluations could be embedded in safety and governance processes for training and deployment. This MIT blog post is a summary entry in a broader risk framework repository rather than new primary research.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.