Weekly Digest 16 Feb 2026

This week at a glance

This week's digest is weighted toward evaluation and standards, with NIST releasing both a formal AI Agent Standards Initiative and a new technical report on statistical methods for AI benchmark assessment — two developments worth tracking as international standards processes that will likely shape ISO/IEC work relevant to Australian procurement and assurance practice. The OECD's new due diligence guidance for responsible AI and two MIT AI Risk Repository frameworks on LLM trustworthiness and AI TRiSM add further reference material for agencies building or refining risk assessment and procurement criteria. On the domestic front, Australia's rise to second place in the OECD Digital Government Index offers a useful data point for practitioners, with the APS AI Plan and the Responsible Use Policy directly credited for the improved result. Taken together, the week's items offer practical inputs for agencies working on AI evaluation methodology, agent security, and risk frameworks, with several open consultation processes — including NIST's RFIs on agent security and identity — providing opportunities for Australian engagement before March and April 2026 close dates.

Headlines

AU Gov · Australia rises to second globally in the OECD Digital Government Index
Global · Announcing the "AI Agent Standards Initiative" for Interoperable and Secure Innovation
Standards · New Report: Expanding the AI Evaluation Toolbox with Statistical Models
Risk · Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

primary source commentary

Australian Government2 items

Digital Transformation Agency(AU) 16 Feb 2026

Australia rises to second globally in the OECD Digital Government Index

The DTA has announced Australia's rise to 2nd place globally in the OECD's 2025 Digital Government Index, up from 5th in 2023, with an overall score of 88% across 42 countries. The index assesses digital by design, user-driven approaches, government as a platform, and proactiveness. Australia's improved 'Proactiveness' score - from 7th to 5th - is partly attributed to the APS AI Plan and the Policy for the Responsible Use of AI in Government. DTA notes forthcoming priorities including the AI technical standard and generative AI guidance. Full OECD findings are scheduled for later in 2026.

Key points

Australia ranked 2nd of 42 countries in the OECD 2025 Digital Government Index with a score of 88%.
The AI Plan for the APS and the Policy for Responsible Use of AI are cited as contributors to the 'Proactiveness' dimension score.
AI governance is one thread in a broader digital government result; the item is primarily a DTA achievement announcement.

Implications

Monitor Policy teams may want to monitor the full OECD Digital Government Index findings when released later in 2026 for more granular benchmarking data.
Consider Agencies developing AI strategy or governance materials could consider referencing Australia's OECD ranking as contextual evidence of the maturity of whole-of-government AI governance settings.

View item Original source ↗

NIST – AI News (topic 2753736)(US) 17 Feb 2026

CAISI to Host Listening Sessions on Barriers to AI Adoption

NIST's Center for AI Standards and Innovation (CAISI) is convening virtual workshops throughout May 2026 to explore barriers and enablers to AI adoption across the healthcare, financial services, and education sectors. Submissions were sought from practitioners with direct procurement, evaluation, and integration experience. The insights gathered will directly inform CAISI's AI adoption guidance work, conducted in furtherance of the US AI Action Plan. While participation is US-focused, findings and outputs from these workshops may offer reusable evidence on sector-specific AI adoption challenges relevant to Australian policy development.

Key points

NIST's CAISI is hosting virtual workshops in May 2026 on AI adoption barriers in healthcare, finance, and education.
Findings will inform CAISI's AI adoption guidance under the US AI Action Plan - outputs may have broader international relevance.
Limited direct relevance to Australian federal agencies; sector focus is US-specific, though emerging findings are worth monitoring.

Implications

Monitor Policy teams working on AI adoption strategy may want to monitor CAISI's published outputs from these workshops for transferable insights on sector-specific barriers.

View item Original source ↗

Global Regulation & Policy1 item

NIST – AI News (topic 2753736)(US) 17 Feb 2026

Announcing the "AI Agent Standards Initiative" for Interoperable and Secure Innovation

NIST's Center for AI Standards and Innovation (CAISI) has launched the AI Agent Standards Initiative to address interoperability, security, and identity challenges for autonomous AI agents. The initiative operates across three pillars: facilitating industry-led standards development and US leadership in international standards bodies; supporting open-source protocol development; and advancing research in AI agent security and identity. NIST will publish guidelines and research deliverables over coming months and is currently seeking stakeholder input via two open RFIs. This initiative is notable because standards developed under it may influence ISO/IEC and other international frameworks that Australian agencies reference.

Key points

NIST's CAISI launches an AI Agent Standards Initiative focused on interoperability, security, and identity for autonomous AI agents.
The initiative will shape international standards body positions, potentially influencing Australian standards adoption and procurement conditions.
Two open RFIs (closing March 9 and April 2) invite stakeholder input on AI agent security and identity frameworks.

Implications

Monitor Standards and AI governance teams may want to monitor CAISI's forthcoming guidelines and research outputs, as they are likely to inform international standards bodies relevant to Australian government AI procurement and deployment.
Consider Agencies exploring or deploying agentic AI use cases could consider reviewing CAISI's AI Agent Security RFI and Identity Concept Paper as early signals of what security and interoperability baselines may emerge.

View item Original source ↗

Standards & Frameworks2 items

NIST – AI News (topic 2753736)(US) 19 Feb 2026

New Report: Expanding the AI Evaluation Toolbox with Statistical Models

NIST's Center for AI Standards and Innovation has released NIST AI 800-3, a technical report proposing improved statistical methods for AI benchmark evaluations. The report formalises two distinct performance measures - benchmark accuracy and generalized accuracy - and demonstrates how generalized linear mixed models (GLMMs) can more precisely quantify uncertainty in LLM performance assessments. The framework was applied to 22 frontier LLMs across three common benchmarks (GPQA-Diamond, BIG-Bench Hard, Global-MMLU Lite). The work is positioned as a contribution to more principled, rigorous AI evaluation practice for evaluators, procurers, and developers.

Key points

NIST CAISI published AI 800-3, introducing statistical frameworks to improve AI benchmark evaluation validity.
The report distinguishes 'benchmark accuracy' from 'generalized accuracy' - a distinction relevant to procurement and assurance decisions in Australian agencies.
Generalized linear mixed models (GLMMs) are proposed as a more rigorous alternative to current AI evaluation methods.

Implications

Monitor Agencies with AI evaluation or assurance responsibilities may want to monitor NIST AI 800-3 as a reference when assessing the statistical rigour of vendor-supplied AI benchmark results.
Consider Teams developing AI procurement criteria or evaluation frameworks could consider whether the benchmark vs. generalised accuracy distinction could be reflected in how vendors are asked to report AI system performance.

View item Original source ↗

OECD AI Wonk Blog(Global) 19 Feb 2026

The OECD’s new responsible AI guidance: A compass for businesses in a complex terrain

The OECD has published Due Diligence Guidance for Responsible AI, described as a resource to help businesses manage AI-related risks, align with global standards, and develop trustworthy AI value chains. The announcement appears on the OECD AI Wonk Blog, though the extracted text is limited to a brief description. Given Australia's alignment with OECD AI Principles and their influence on domestic frameworks such as the APS AI Plan and DTA's responsible AI policy, this guidance is likely to be of interest to agencies tracking international standards development.

Key points

OECD has released Due Diligence Guidance for Responsible AI targeting business AI risk management.
The guidance aims to help organisations meet global standards and build trustworthy AI value chains.
Extracted text is minimal - substantive content requires direct engagement with the source.

Implications

Monitor Policy and governance teams may want to monitor this guidance for content that intersects with Australia's responsible AI frameworks or procurement due diligence requirements.
Consider Agencies developing AI supplier or value chain risk assessments could consider whether OECD due diligence criteria are compatible with or additive to existing Commonwealth approaches.

View item Original source ↗

Risk, Assurance & Ethics2 items

MIT AI Risk Repository – Blog(Global) 22 Feb 2026

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

The MIT AI Risk Repository has spotlighted a 2023 academic paper by Liu et al. proposing a comprehensive taxonomy for evaluating LLM trustworthiness across seven dimensions: reliability, safety, fairness, resistance to misuse, explainability and reasoning, social norms, and robustness. Each dimension contains subcategories covering risks such as hallucination, sycophancy, prompt attacks, and cultural insensitivity. The paper also provides a guideline for multi-objective evaluation using automated and templated question generation. The blog post itself is a brief summary adding no original analysis; the signal value is primarily the taxonomy structure as an input to AI risk or evaluation frameworks.

Key points

A 2023 academic paper proposes a taxonomy of 7 major LLM trustworthiness categories covering 29 subcategories.
The MIT AI Risk Repository spotlights this as one of 30 risk frameworks it has catalogued - useful for APS risk inventory work.
The paper itself is two years old; the blog post adds no new analysis beyond the repository spotlight.

Implications

Consider Agencies developing AI risk registers or evaluation criteria could consider whether this taxonomy's seven dimensions map usefully onto their existing risk categorisation structures.
Monitor Policy teams tracking the MIT AI Risk Repository may want to monitor the full set of 30 frameworks it has catalogued for emerging patterns in AI risk classification.

View item Original source ↗

MIT AI Risk Repository – Blog(Global) 19 Feb 2026

Artificial Intelligence Trust, Risk and Security Management (AI TRiSM)

The MIT AI Risk Repository's blog spotlights a 2024 academic paper by Habbal, Ali, and Abuzaraida reviewing the AI Trust, Risk, and Security Management (AI TRiSM) framework. The framework organises AI risks into three pillars - trust management (bias, privacy), risk management (societal manipulation, deepfakes, autonomous weapons), and security management (malicious use, insufficient controls) - and is designed to apply across the full AI system lifecycle, with particular focus on healthcare and finance sectors. The blog post is a brief summary pointing readers to the underlying publication rather than original analysis.

Key points

MIT AI Risk Repository spotlights the AI TRiSM framework covering trust, risk, and security management across AI lifecycles.
Framework organises AI risks under bias, privacy, deepfakes, societal manipulation, autonomous weapons, and malicious use.
This is a literature synthesis blog post - the underlying 2024 academic paper carries more analytical depth.

Implications

Consider APS AI governance teams could consider mapping AI TRiSM's risk taxonomy against the APS AI Policy's responsible use obligations to identify coverage gaps or useful framing.
Monitor Agencies tracking international risk frameworks may want to monitor the MIT AI Risk Repository's ongoing series, which systematically catalogues frameworks potentially useful for comparative governance work.

View item Original source ↗

Implications are AI-generated. Starting points, not advice — see methodology for how they're framed.