Safety Assessment of Chinese Large Language Models
LLM safety taxonomies inform how agencies categorise and assess AI risks - this one offers a structured, benchmarked framework worth noting.
Key points
- A 2023 paper proposes a safety taxonomy for Chinese LLMs covering 8 harm scenarios and 6 adversarial attack types.
- The taxonomy is noted as scalable beyond Chinese-language models, with benchmarking across 15 LLMs including GPT series.
- This is a blog spotlight of an older paper via the MIT AI Risk Repository - limited immediacy for APS readers.
Summary
The MIT AI Risk Repository has spotlighted a 2023 academic paper by Sun et al. proposing a safety assessment framework for Chinese large language models. The framework includes a taxonomy of eight harm scenario types (covering insult, discrimination, crime, sensitive topics, physical and mental harm, privacy, and ethics) and six adversarial instruction attack types. The authors benchmarked 15 LLMs using this taxonomy and produced a safety leaderboard. While the paper focuses on Chinese-language models, the authors note the taxonomy is adaptable to other languages and contexts.
Implications for Australian agencies
- Monitor APS teams developing AI risk taxonomies or safety evaluation frameworks may want to note this taxonomy as one reference point among others, particularly for adversarial prompt attack categories.
Implications are AI-generated. Starting points, not advice.
"Safety Assessment of Chinese Large Language Models" Source: MIT AI Risk Repository – Blog Published: 9 February 2026 URL: https://airisk.mit.edu/blog/safety-assessment-of-chinese-large-language-models The MIT AI Risk Repository has spotlighted a 2023 academic paper by Sun et al. proposing a safety assessment framework for Chinese large language models. The framework includes a taxonomy of eight harm scenario types (covering insult, discrimination, crime, sensitive topics, physical and mental harm, privacy, and ethics) and six adversarial instruction attack types. The authors benchmarked 15 LLMs using this taxonomy and produced a safety leaderboard. While the paper focuses on Chinese-language models, the authors note the taxonomy is adaptable to other languages and contexts. Implications for Australian agencies: - [Monitor] APS teams developing AI risk taxonomies or safety evaluation frameworks may want to note this taxonomy as one reference point among others, particularly for adversarial prompt attack categories. Retrieved from SIMS, 18 May 2026.