Weekly AI Digest

23 Mar 2026 – 29 Mar 2026

Generated 16 May 2026, 02:25 PM AEST

This week at a glance

This week's digest centres on AI assurance and evaluation practice, with several items directly relevant to APS agencies building or procuring AI capabilities. A practitioner guide from Australian quality engineering firm KJR sets out a structured approach to LLM testing for regulated sectors, including government, with particular attention to accountability obligations that cannot be passed to model vendors. On the evaluation standards front, NIST's CAISI has partnered with OpenMined to develop privacy-preserving AI assessment methods for settings where data or models cannot be openly shared — a practical constraint familiar to many federal agencies. Rounding out the week, UK AI Security Institute findings on frontier models completing multi-step cyberattacks in controlled exercises add weight to the case for robust pre-deployment security testing, while an OECD commentary questions whether current stakeholder engagement practices in AI governance are substantive enough to meet trustworthiness requirements across a system's full lifecycle.

Australian Government

  1. AU 24 Mar 2026 KJR – Insights

    KJR, an Australian quality engineering consultancy, has published a practitioner guide on LLM testing as a component of enterprise AI assurance. The guide distinguishes LLM testing from traditional software QA - emphasising probabilistic outputs, adversarial security scenarios, bias assessment, and continuous drift detection - and proposes a four-phase framework from risk identification through governance reporting. It explicitly targets Australian regulated sectors, including government, and argues that enterprise accountability for LLM behaviour cannot be delegated to model providers. The piece is commercially motivated but covers substantive ground relevant to APS agencies developing or procuring LLM-based tools.

    Implications

    • Consider Agencies developing AI assurance or quality frameworks for LLM-based tools could assess whether their current test strategies address probabilistic outputs, prompt injection risks, and drift detection.
    • Consider Procurement and governance teams may want to note the framing that enterprise accountability for LLM behaviour cannot be outsourced to model providers - relevant to vendor contract and risk allocation clauses.

    Implications are AI-generated. Starting points, not advice.

    View details →

  2. US 27 Mar 2026 NIST – AI News (topic 2753736)

    NIST's Center for AI Standards and Innovation (CAISI) has signed a Cooperative Research and Development Agreement (CRADA) with OpenMined, a non-profit specialising in open-source secure computation tooling. The collaboration will develop privacy-preserving methods for evaluating AI systems where underlying data, models, or benchmarks cannot be freely shared due to IP, privacy, or national security constraints. It will leverage OpenMined's PySyft infrastructure and is intended to produce voluntary standards and best practices for AI measurement - including for workforce and productivity uplift assessments.

    Implications

    • Monitor AISI and DISR policy teams may want to monitor outputs from this collaboration, as resulting standards could inform Australian AI evaluation frameworks.
    • Consider Agencies developing AI evaluation processes for sensitive use cases could consider privacy-preserving computation approaches when designing assurance activities.

    Implications are AI-generated. Starting points, not advice.

    View details →

Global Regulation & Policy

No primary items in this section.

Standards & Frameworks

  1. Global 24 Mar 2026 OECD AI Wonk Blog

    An OECD AI Wonk Blog post argues that participatory AI governance is frequently reduced to one-off consultation rather than sustained stakeholder involvement across an AI system's full lifecycle. The piece calls for governance infrastructure and genuine community authority as necessary conditions for trustworthy AI. Only a brief abstract is available in the extracted text, so the specific recommendations and evidence base cannot be assessed from this item alone.

    Implications

    • Monitor Policy teams working on AI governance frameworks may want to read the full OECD post and consider whether lifecycle stakeholder engagement is adequately addressed in current APS guidance.
    • Consider Agencies developing or reviewing AI governance arrangements could assess whether their stakeholder engagement extends beyond initial consultation to deployment and post-deployment monitoring phases.

    Implications are AI-generated. Starting points, not advice.

    View details →

Public Sector Practice & Guidance

No primary items in this section.

Also relevant here

Risk, Assurance & Ethics

No primary items in this section.

Technical Developments

  1. Multi 23 Mar 2026 Import AI – Substack (Jack Clark)

    This edition of Import AI covers three research threads with varying APS relevance. Most significantly, the UK AI Security Institute has published findings from cyber-range exercises showing frontier AI models improving substantially at multi-step cyberattacks end-to-end, with the best run completing 22 of 32 steps on a simulated corporate network. Google DeepMind has also proposed a cognitive taxonomy for evaluating AI systems across ten dimensions as a replacement for saturated benchmarks. A third thread covers research finding that Google's Gemma models display distinctive 'distress-like' response patterns under repeated failure, fixable via direct preference optimisation—raising questions about whether emotional instability could affect safety-relevant AI behaviour.

    Implications

    • Monitor Australian Signals Directorate and AISI-adjacent teams may want to monitor the UK AISI cyber-range methodology and results, as findings directly inform government AI security risk assessments.
    • Consider Agencies evaluating or procuring AI systems could consider whether LLM behavioural stability under adversarial or high-frustration conditions is included in their testing and assurance requirements.

    Implications are AI-generated. Starting points, not advice.

    View details →