CAISI Evaluation of Kimi K2 Thinking

12 Dec 2025 · NIST – AI News (topic 2753736) US

CAISI's structured evaluation of PRC-origin open-weight AI models provides a replicable framework for assessing capability and censorship risks - directly relevant to Australian government AI procurement and security posture.

Key points

Summary

In November 2025, NIST's Center for AI Standards and Innovation (CAISI) evaluated Kimi K2 Thinking, an open-weight model from PRC-based Moonshot AI. The evaluation found it to be the most capable PRC model at release, though still trailing leading US models across cyber, software engineering, science, and mathematics benchmarks. A notable finding is that the model applies heavy censorship in Chinese aligned with CCP talking points, while remaining relatively uncensored in English and other languages. CAISI's systematic approach to evaluating PRC-origin models - including censorship scoring - offers a methodological reference point for any Australian agency considering open-weight model adoption or advising on AI supply chain risk.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.