This week's digest is weighted toward evaluation and standards, with NIST releasing both a formal AI Agent Standards Initiative and a new technical report on statistical methods for AI benchmark assessment — two developments worth tracking as international standards processes that will likely shape ISO/IEC work relevant to Australian procurement and assurance practice. The OECD's new due diligence guidance for responsible AI and two MIT AI Risk Repository frameworks on LLM trustworthiness and AI TRiSM add further reference material for agencies building or refining risk assessment and procurement criteria. On the domestic front, Australia's rise to second place in the OECD Digital Government Index offers a useful data point for practitioners, with the APS AI Plan and the Responsible Use Policy directly credited for the improved result. Taken together, the week's items offer practical inputs for agencies working on AI evaluation methodology, agent security, and risk frameworks, with several open consultation processes — including NIST's RFIs on agent security and identity — providing opportunities for Australian engagement before March and April 2026 close dates.
The DTA has announced Australia's rise to 2nd place in the OECD's 2025 Digital Government Index, up from 5th in 2023. The result reflects progress across governance, shared platforms, and user-centred service design. The APS AI Plan and the Policy for the Responsible Use of AI in Government are specifically credited with improving Australia's 'proactiveness' score. The DTA signals ongoing work on AI technical standards, generative AI guidance, and data governance as areas for continued investment.
Implications
ConsiderAgencies developing AI strategy or capability documentation could reference Australia's OECD ranking as external validation of the current governance framework.
MonitorThe full OECD Digital Government Index findings are due later in 2026 and may contain more granular assessments worth reviewing for AI governance benchmarking.
Implications are AI-generated. Starting points, not advice.
NIST's Center for AI Standards and Innovation (CAISI) is convening virtual workshops in May 2026 to explore barriers and enablers to AI adoption in the healthcare, financial services, and education sectors. The sessions will gather concrete examples of successful and unsuccessful AI implementation efforts, with a particular interest in procurement, evaluation, and integration. Outputs are intended to support the US AI Action Plan and help organisations adopt AI with greater confidence. While the workshops are US-focused, the sector-specific framing mirrors challenges faced by Australian federal and state agencies in the same domains.
Implications
MonitorAgencies working on AI adoption in health, finance, or education may want to monitor CAISI's published outputs from these workshops for transferable frameworks or findings.
Implications are AI-generated. Starting points, not advice.
NIST's Center for AI Standards and Innovation has formally launched an AI Agent Standards Initiative to address interoperability, security, and identity challenges for autonomous AI agents. The initiative operates across three pillars: facilitating industry-led technical standards and US leadership in international bodies, fostering open-source protocol development, and advancing research into agent security and identity. Two open RFIs — on AI agent security and agent identity and authorisation — close in March and April 2026 respectively, with sector-specific listening sessions to follow. While US-focused, the initiative is explicitly designed to diffuse benefits globally and will likely influence ISO/IEC and other international standards processes relevant to Australia.
Implications
MonitorDISR, DTA, and AISI policy teams may want to monitor CAISI's forthcoming guidelines and research outputs, as they are likely to inform international AI agent standards Australia will eventually adopt or reference.
ConsiderAgencies actively piloting or procuring agentic AI systems could consider whether emerging NIST agent security and identity frameworks could inform their risk assessment and procurement criteria.
Implications are AI-generated. Starting points, not advice.
NIST's Center for AI Standards and Innovation has released NIST AI 800-3, a technical report proposing statistical frameworks to improve the validity and robustness of AI benchmark evaluations. It formally distinguishes two performance concepts - benchmark accuracy and generalised accuracy - that are commonly conflated, and shows that this conflation can produce misleading comparisons between AI systems. The report demonstrates that generalised linear mixed models (GLMMs) can more precisely quantify uncertainty in LLM performance than prevailing methods. The work is aimed at evaluators, procurers, and practitioners who rely on benchmark results to understand AI system capability.
Implications
ConsiderAgencies involved in AI procurement or capability evaluation could consider whether their current vendor assessment criteria account for the distinction between benchmark accuracy and generalised accuracy.
MonitorAISI and DTA policy teams may want to monitor how NIST AI 800-3 influences international AI evaluation standards, as it could inform Australian guidance on AI performance claims.
Implications are AI-generated. Starting points, not advice.
The OECD has released Due Diligence Guidance for Responsible AI, aimed at helping businesses identify and manage AI-related risks, meet international standards, and build trustworthy AI value chains. The guidance appears to complement the OECD AI Principles and extends their application into practical business conduct. However, only a short blog summary is available from this item, limiting assessment of its scope, methodology, or specific provisions. Australian agencies engaged in AI governance or procurement policy may find it relevant as a reference point.
Implications
MonitorPolicy teams working on AI procurement or vendor governance frameworks may want to monitor the full OECD guidance document once accessible, to assess alignment with Australian government requirements.
ConsiderAgencies referencing international AI standards in their own governance frameworks could consider whether this OECD due diligence guidance warrants citation alongside existing OECD AI Principles references.
Implications are AI-generated. Starting points, not advice.
MIT's AI Risk Repository has spotlighted a 2023 academic paper by Liu et al. that proposes a comprehensive taxonomy for evaluating large language model alignment and trustworthiness. The framework organises AI challenges into seven major categories — reliability, safety, fairness, resistance to misuse, explainability, social norms, and robustness — with 29 subcategories and guidance on multi-objective evaluation methods. While the paper predates this blog entry by over two years, its inclusion in the MIT Risk Repository signals ongoing relevance as a reference framework. APS agencies developing LLM evaluation criteria or procurement risk assessments may find the taxonomy a useful structured starting point.
Implications
ConsiderAgencies developing AI risk assessments or procurement criteria for LLM-based tools could assess whether this taxonomy usefully supplements existing DTA and DISR guidance.
MonitorAI governance teams may want to monitor the MIT AI Risk Repository's ongoing framework spotlights as a source of structured risk taxonomies for comparative analysis.
Implications are AI-generated. Starting points, not advice.
The MIT AI Risk Repository has highlighted the AI TRiSM (Trust, Risk and Security Management) framework, drawn from a 2024 peer-reviewed paper by Habbal, Ali, and Abuzaraida. The framework organises AI-related risks into three domains: trust management (bias, discrimination, privacy), risk management (societal manipulation, deepfakes, lethal autonomous weapons), and security management (malicious use, insufficient security measures). Designed to be applied across the full AI system lifecycle, it synthesises academic literature on risk mitigation with particular attention to healthcare and finance sectors. The MIT blog post is a summary only; the underlying paper is the primary reference.
Implications
ConsiderAPS AI governance practitioners could assess whether the AI TRiSM risk taxonomy complements or overlaps with existing Australian frameworks such as the DISR Responsible AI framework or agency-level risk registers.
MonitorTeams tracking the MIT AI Risk Repository may want to note this as part of the broader landscape of risk classification frameworks being consolidated internationally.
Implications are AI-generated. Starting points, not advice.