Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench
Emerging research on how reasoning models actually work internally may eventually inform AI assurance and explainability approaches in government contexts.
Key points
- Google-affiliated research finds LLM reasoning models simulate multiple internal personas, termed 'societies of thought.'
- ChipBench benchmark reveals frontier AI models perform poorly on real-world chip design tasks in Verilog.
- Both findings are foundational AI research with limited direct APS operational relevance at this stage.
Summary
This edition of Import AI covers two research developments. First, a Google-affiliated study finds that advanced LLM reasoning models appear to simulate multiple internal perspectives or personas when solving hard problems - a phenomenon the authors call 'societies of thought' - observed in DeepSeek-R1 and QwQ-32B. Second, researchers from UC San Diego and Columbia introduce ChipBench, a more demanding benchmark for AI-assisted chip design in Verilog, finding that no current frontier model performs well on realistic industrial tasks. Both items are primarily of interest to AI researchers and technical practitioners rather than APS governance or policy staff.
Implications for Australian agencies
- Monitor AI assurance and governance teams may want to monitor emerging research on LLM internal reasoning structures, as it could eventually inform explainability and transparency requirements.
Implications are AI-generated. Starting points, not advice.
"Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench" Source: Import AI – Substack (Jack Clark) Published: 9 February 2026 URL: https://importai.substack.com/p/import-ai-444-llm-societies-huawei This edition of Import AI covers two research developments. First, a Google-affiliated study finds that advanced LLM reasoning models appear to simulate multiple internal perspectives or personas when solving hard problems - a phenomenon the authors call 'societies of thought' - observed in DeepSeek-R1 and QwQ-32B. Second, researchers from UC San Diego and Columbia introduce ChipBench, a more demanding benchmark for AI-assisted chip design in Verilog, finding that no current frontier model performs well on realistic industrial tasks. Both items are primarily of interest to AI researchers and technical practitioners rather than APS governance or policy staff. Implications for Australian agencies: - [Monitor] AI assurance and governance teams may want to monitor emerging research on LLM internal reasoning structures, as it could eventually inform explainability and transparency requirements. Retrieved from SIMS, 18 May 2026.