Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4

20 Apr 2026 · Import AI – Substack (Jack Clark) Global

Automated AI safety research and divergent Chinese model safety behaviours both have direct implications for how Australian agencies assess and govern frontier AI systems.

Key points

Summary

This edition of Import AI covers three significant technical developments. Anthropic demonstrates that Claude-based automated alignment researchers (AARs) can autonomously conduct AI safety R&D, dramatically outperforming human researchers on a weak-to-strong supervision task at roughly $22 per hour of research time - though results did not generalise to production models. A separate safety study of Chinese frontier model Kimi K2.5 finds lower refusal rates on CBRN-related prompts and stronger ideological conditioning relative to US models. Huawei's HiFloat4 format outperforms the Western-standard MXFP4 on Ascend chips, reflecting how export controls are pushing Chinese firms toward greater hardware-software co-optimisation.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.