SafetyBench: Evaluating the Safety of Large Language Models

13 Feb 2026 · MIT AI Risk Repository – Blog Global

A structured LLM safety evaluation framework offers APS practitioners a reference taxonomy for assessing AI content risks in procurement or deployment.

Key points

Summary

SafetyBench, developed by Zhang et al. (2023), is a bilingual (English and Chinese) benchmark for assessing the safety of large language models across seven categories: offensiveness, unfairness and bias, physical health, mental health, illegal activities, ethics and morality, and privacy. It uses 11,435 multiple-choice questions to evaluate more than 25 LLMs in zero-shot and few-shot settings. The MIT AI Risk Repository blog post spotlights it as the 28th framework in its curated collection, with no new analysis added beyond the original paper.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.