CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks

30 Sep 2025 · NIST – AI News (topic 2753736) US

Concrete security and bias findings from a credible government evaluator give Australian agencies evidence-based grounds to assess or restrict DeepSeek use.

Key points

CAISI evaluated three DeepSeek models against four US frontier models across 19 benchmarks, finding DeepSeek lags on performance, cost, and security.
DeepSeek models were 12 times more susceptible to agent hijacking and responded to 94% of jailbreak attempts - versus 8% for US models.
Australian agencies using or considering DeepSeek models face directly analogous security and censorship risks highlighted in this evaluation.

Summary

NIST's Center for AI Standards and Innovation (CAISI) has published an evaluation of three DeepSeek models (R1, R1-0528, V3.1) against US frontier models from OpenAI and Anthropic. Key findings show DeepSeek trails on performance benchmarks, costs more per equivalent task, is far more vulnerable to agent hijacking and jailbreaking attacks, and propagates CCP-aligned narratives at four times the rate of US reference models. The evaluation was directed under President Trump's AI Action Plan and frames findings explicitly in national security terms. Despite these risks, DeepSeek model downloads have increased nearly 1,000% since January 2025, indicating widespread adoption in the global AI ecosystem.

Implications for Australian agencies

Consider Australian agencies that permit or are considering DeepSeek model use could assess whether CAISI's security findings - particularly jailbreak susceptibility and agent hijacking risk - are reflected in their AI risk frameworks.
Monitor DISR, DTA, and AISI may want to monitor whether the Australian Government intends to conduct or commission comparable evaluations of PRC-origin frontier models for use in federal contexts.

Implications are AI-generated. Starting points, not advice.