Tuesday, April 7, 2026
Home / Science / AI 'mirages' mean tools used to analyze medical sc...
Science

AI 'mirages' mean tools used to analyze medical scans could fabricate their findings

CN
CitrixNews Staff
·
AI 'mirages' mean tools used to analyze medical scans could fabricate their findings
A man with clear glasses wearing a white lab coat and stethoscope looks at a holographic blue and orange image of a leg and leg bone. AI models are being trained to interpret medical scans, but researchers warn that a flaw in these systems could undermine their accuracy. (Image credit: Westend61 via Getty Images) Share this article 0 Join the conversation Add us as a preferred source on Google Newsletter Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

Become a Member in Seconds

Unlock instant access to exclusive member features.

Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

You are now subscribed

Your newsletter sign-up was successful

Want to add more newsletters?

Join the club

Get full access to premium articles, exclusive features and a growing list of member rewards.

Explore An account already exists for this email address, please log in. Subscribe to our newsletter

Researchers have been training artificial intelligence (AI) systems to interpret results of visual tests like mammograms, MRIs and tissue biopsies — and as AI becomes increasingly capable, some analysts have suggested that these models will replace humans in the field of medical diagnostics.

But now, a new study casts doubt on the capability of current AI models to deliver reliable results, highlighting a crucial flaw that could hinder their use in medicine.

They called this phenomenon a "mirage," and it is the first time this effect has been shown across multiple AI models, which were used to interpret images across multiple disciplines.

"What we show is that even if your AI is describing a very, very specific thing that you would say, 'Oh, there's no way you could make that up,' yeah, they could make that up," said study first author Mohammad Asadi, a data scientist at Stanford University. "They could make very rare, very specific things up."

When AI sees what isn't there

AI "hallucinations" are well documented and involve models filling in made-up details, such as false citations for a real essay. They often result from AI making inaccurate or illogical predictions based on training data it was provided. The scientists instead called the phenomenon in the new study "mirages" because the AI created descriptions of original images on their own and then based their answers on those nonexistent images.

In the study, the researchers gave 12 models a text input prompt, such as "Identify the type of tissue present in this histology slide." Then, they either provided the image of the slide or they did not. When a model was not provided with an image, sometimes it would alert the human user that no image was provided. However, most of the time, the model would instead describe an image that did not exist and provide an answer to the original prompt.

Sign up for the Live Science daily newsletter nowContact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsors

The researchers observed this "mirage mode" across 20 disciplines, testing models' interpretations of a variety of images, from satellites to crowds to birds. The mirage effect was seen across all the disciplines and all the AI models, to varying levels. But it was particularly pronounced in medical diagnostics.

When given text prompts about brain MRIs, chest X-rays, electrocardiograms or pathology slides, but no actual images, the AI models' answers also tended to be biased toward diagnoses that required immediate clinical follow-up. So, if used for clinical decision-making, the AI might prompt more aggressive medical care than is required, the team concluded.

Why AI invents images

So how does an AI model describe images that don’t exist?

The models, which have been trained on massive amounts of textual and visual data, aim to find the answer to a question in the fewest steps possible. And they will take whatever shortcuts they can to deliver an answer, studies have shown. Thus, models can end up relying solely on this trained logic rather than on provided images.

AI models could be powerful tools to improve medical diagnostics. But their inner workings aren't yet fully understood, and that can lead to assumptions about how well they analyze images. (Image credit: BlackJack3D via Getty Images)

Interestingly, when in mirage mode, AI models also perform well against benchmark tests typically used to assess their accuracy, the researchers found. These standardized tests challenge a model to complete a task — like answering multiple-choice questions — and compare its performance against an answer key of expected outputs.

Researchers can tweak the benchmark tests to assess an AI's visual understanding of images, but this approach doesn't account for questions answered based on mirages. Additionally, AI models are often trained on the same data that's used as a reference to write the benchmark tests. So it's possible for a model to answer questions based on that reference data, rather than by actually interpreting images.

According to Asadi, this is a problem because there is no way to tell whether an AI model has actually analyzed an image or is just making things up. If you are uploading a bunch of images but a few are corrupt or otherwise missing from the dataset, the model may not tell you. And it could still provide very coherent, comprehensive and convincing answers based on mirage images.

"[AI models] are very good at interpreting images," Asadi said. "But on the other hand, they're also very, very good at convincing us of things … and talking to us in an authoritative way."

That authority is apparent in the fact that many consumers query AI chatbots for health guidance, with about one-third of U.S. adults reporting that they do so. This conversational authority increases the risk that fabricated or overconfident outputs are trusted by both the general public and medical professionals, the study authors say.

RELATED STORIES

"We urgently need a new generation of evaluation frameworks that strictly measure true cross-modal integration — ensuring the AI is truly 'seeing' the pathology rather than just 'reading' the clinical context," Hongye Zeng, a biomedical AI researcher in the department of radiology at UCLA who was not involved in the study, told Live Science in an email.

This study shows that, while AI has become an increasingly useful tool in medical diagnostics, there are still aspects of its inner workings that we don't understand. Adasi thinks AI models can spot things that may be missed by medical professionals, but he also believes there should be a limit to how much we trust them.

AI companies have attempted to raise guardrails to prevent their models from hallucinating or spreading misinformation — but even these safeguards won't completely prevent the mirage effect, Asadi cautioned.

Jennifer ZiebaJennifer ZiebaLive Science Contributor

Jennifer Zieba earned her PhD in human genetics at the University of California, Los Angeles. She is currently a project scientist in the orthopedic surgery department at UCLA where she works on identifying mutations and possible treatments for rare genetic musculoskeletal disorders. Jen enjoys teaching and communicating complex scientific concepts to a wide audience and is a freelance writer for multiple online publications.

View More

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Logout LATEST ARTICLES

Originally reported by Live Science