CAISI Evaluation of DeepSeek V4 Pro

1 May 2026 · NIST – AI News (topic 2753736) US

Independent US government evaluation of a leading PRC AI model challenges vendor self-reporting and signals the value of third-party capability assessment — a practice Australian agencies may wish to reference.

Key points

Summary

NIST's Center for AI Standards and Innovation (CAISI) conducted an independent evaluation of DeepSeek V4 Pro in April 2026, finding it to be the most capable PRC AI model assessed to date but trailing the US frontier by approximately 8 months when measured against non-public, held-out benchmarks. Notably, DeepSeek's own self-reported evaluations present a more favourable picture, suggesting rough parity with frontier US models — a discrepancy CAISI attributes to benchmark selection. DeepSeek V4 was more cost-efficient than the comparable US reference model (GPT-5.4 mini) on five of seven benchmarks. The evaluation demonstrates an emerging US government practice of independent, rigorous model assessment using proprietary benchmarks to resist contamination and gaming.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.