Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment
A structured LLM evaluation taxonomy could inform how APS agencies assess AI model trustworthiness against the responsible AI policy framework.
Key points
- A 2023 paper catalogues LLM trustworthiness across 7 categories and 29 subcategories, now spotlighted by MIT's AI Risk Repository.
- The taxonomy covers reliability, safety, fairness, misuse resistance, explainability, social norms, and robustness — directly mapping to APS AI governance concerns.
- This is a secondary blog summary of a 2023 arXiv paper; the taxonomy itself is not new or Australia-specific.
Summary
MIT's AI Risk Repository has spotlighted a 2023 academic paper by Liu et al. that proposes a comprehensive taxonomy for evaluating large language model alignment and trustworthiness. The framework organises AI challenges into seven major categories — reliability, safety, fairness, resistance to misuse, explainability, social norms, and robustness — with 29 subcategories and guidance on multi-objective evaluation methods. While the paper predates this blog entry by over two years, its inclusion in the MIT Risk Repository signals ongoing relevance as a reference framework. APS agencies developing LLM evaluation criteria or procurement risk assessments may find the taxonomy a useful structured starting point.
Implications for Australian agencies
- Consider Agencies developing AI risk assessments or procurement criteria for LLM-based tools could assess whether this taxonomy usefully supplements existing DTA and DISR guidance.
- Monitor AI governance teams may want to monitor the MIT AI Risk Repository's ongoing framework spotlights as a source of structured risk taxonomies for comparative analysis.
Implications are AI-generated. Starting points, not advice.
"Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment" Source: MIT AI Risk Repository – Blog Published: 22 February 2026 URL: https://airisk.mit.edu/blog/trustworthy-llms MIT's AI Risk Repository has spotlighted a 2023 academic paper by Liu et al. that proposes a comprehensive taxonomy for evaluating large language model alignment and trustworthiness. The framework organises AI challenges into seven major categories — reliability, safety, fairness, resistance to misuse, explainability, social norms, and robustness — with 29 subcategories and guidance on multi-objective evaluation methods. While the paper predates this blog entry by over two years, its inclusion in the MIT Risk Repository signals ongoing relevance as a reference framework. APS agencies developing LLM evaluation criteria or procurement risk assessments may find the taxonomy a useful structured starting point. Implications for Australian agencies: - [Consider] Agencies developing AI risk assessments or procurement criteria for LLM-based tools could assess whether this taxonomy usefully supplements existing DTA and DISR guidance. - [Monitor] AI governance teams may want to monitor the MIT AI Risk Repository's ongoing framework spotlights as a source of structured risk taxonomies for comparative analysis. Retrieved from SIMS, 18 May 2026.