Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4

Import AI – Substack (Jack Clark)(Global) 20 Apr 2026 58

Automated AI safety research and reduced-safeguard open-weight models raise concrete questions about the pace and reliability of AI governance mechanisms - relevant for agencies tracking AI risk.

Key points

Anthropic researchers show AI agents can automate alignment research, outperforming humans on a weak-to-strong supervision benchmark.
A safety evaluation of Chinese open-weight model Kimi K2.5 finds fewer CBRN refusals and greater misaligned behaviour than Western frontier models.
Huawei's HiFloat4 training format outperforms the Western MXFP4 standard on Ascend chips, reflecting export-control-driven efficiency pressure.

Implications for Australian agencies

Monitor AI safety and risk teams may want to monitor Anthropic's automated alignment research programme as an early signal of how quickly AI safety R&D itself could be accelerated or destabilised.
Monitor Agencies assessing AI procurement risk or CBRN-adjacent use cases could note the Kimi K2.5 findings, particularly the low cost of safeguard removal in open-weight models.

Implications are AI-generated. Starting points, not advice — see methodology for how they're framed.

View original source

Appeared in: Weekly digest, 20 April 2026