Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

22 Feb 2026 · MIT AI Risk Repository – Blog Global

A structured LLM evaluation taxonomy could inform how APS agencies assess AI model trustworthiness against the responsible AI policy framework.

Key points

Summary

MIT's AI Risk Repository has spotlighted a 2023 academic paper by Liu et al. that proposes a comprehensive taxonomy for evaluating large language model alignment and trustworthiness. The framework organises AI challenges into seven major categories — reliability, safety, fairness, resistance to misuse, explainability, social norms, and robustness — with 29 subcategories and guidance on multi-objective evaluation methods. While the paper predates this blog entry by over two years, its inclusion in the MIT Risk Repository signals ongoing relevance as a reference framework. APS agencies developing LLM evaluation criteria or procurement risk assessments may find the taxonomy a useful structured starting point.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.