Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

18 Sep 2024 · MIT AI Risk Repository – Blog Global

A structured taxonomy of LLM safety risks provides a reference baseline for APS agencies developing AI risk registers or evaluation criteria.

Key points

Summary

This MIT AI Risk Repository summary covers an academic survey that systematically catalogues safety risks in generative language models across seven categories: toxic content, discrimination, ethics and morality, controversial opinions, misleading information, privacy leakage, and malicious use. The paper also reviews safety evaluation methodologies - including adversarial testing and preference-based assessment - and improvement strategies spanning the model development lifecycle. While the paper itself is from 2023, the MIT repository's inclusion signals ongoing academic consensus-building around LLM risk classification. The taxonomy is broadly compatible with frameworks used in Australian AI governance contexts, including the DISR Responsible AI framework.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.