New study warns of risks in AI chatbots giving medical advice

9 Feb 2026 · Oxford Internet Institute – News UK

Rigorous empirical evidence that LLM benchmarks fail to predict real-world medical safety - directly relevant to AI risk assessment in health and human services contexts.

Key points

Summary

A Nature Medicine study from the Oxford Internet Institute, involving nearly 1,300 participants, found that LLMs provided no meaningful improvement over traditional search engines for medical advice and introduced risks through inaccurate, inconsistent, and hard-to-evaluate outputs. Users struggled to know what information to provide, and models gave highly variable answers to slight question variations. Critically, the study demonstrates that standard benchmark evaluations fail to capture real-world performance - a finding with broad implications for how governments and regulators assess AI system safety before deployment in high-stakes domains.

Implications for Australian agencies

Implications are AI-generated. Starting points, not advice.