Spread the love


AI safety, put simply, is the practice of ensuring that AI behaves as intended, particularly in high-risk settings like medicine. Photograph used for representational purposes only

AI safety, put simply, is the practice of ensuring that AI behaves as intended, particularly in high-risk settings like medicine. Photograph used for representational purposes only
| Photo Credit: Getty Images

In 1982, a chilling tragedy in Chicago claimed seven lives after Tylenol (paracetamol) capsules were mixed with cyanide—not during manufacturing, but after reaching store shelves by unknown killer(s). Until the 1980s, products weren’t routinely sealed, and consumers could not know if items had been tampered with. The incident exposed a critical vulnerability and led to a sweeping reform: the introduction of tamper-evident sealed packaging. What was once optional became essential. Today, whether it’s food, medicine, or cosmetics, a sealed cover signifies safety. That simple seal, born from crisis, transformed into a universal symbol of trust.

We are once again at a similar crossroads. Large Language Models (LLM) like ChatGPT, Gemini, and Claude are advanced systems trained to generate human-like text. In the medical field, LLMs are increasingly being used to draft clinical summaries, explain diagnoses in simple language, generate patient instructions, and even help in decision-making processes. A recent survey found that over 65% of healthcare professionals have used LLMs, and more than half do so weekly for administrative relief or clinical insight in the United States. This integration is quick and often unregulated, especially in private settings. The success of these systems depends on the propriety Artificial Intelligence (AI) models built by companies, and the quality of training data.

How LLMs work

To put it simply, an LLM is an advanced computer programme that generates text based on patterns it has learned. It is trained using a training dataset—vast text collections from books, articles, web pages, and medical databases. These texts are broken into tokens (words or word parts), which the model digests to predict the most likely next word in a sentence. The model weights—numbers encode this learning—are adjusted during training and stored as part of the AI’s core structure. When someone queries the LLM—whether a patient asking for drug side effects or a doctor seeking help with a rare disease—the model draws from its trained knowledge and formulates a response. The model performs well if the training data is accurate and balanced.

Silent saboteur: data poisoning

Training datasets are the raw material on which LLMs are built. Some of the most widely used biomedical and general training datasets include The Pile, PubMed Central, Open Web Text, C4, Refined Web, and Slim Pajama. These contain moderated content (like academic journals and books) and unmoderated content (like web pages, GitHub posts, and online forums).

A recent study in Nature Medicine published online in January 2025, explored a deeply concerning threat: data poisoning. Unlike hacking into an AI model that requires expertise, this study intentionally created a poisonous training dataset using the OpenAI GPT-3.5-turbo API. It generated fake but convincing medical articles containing misinformation—such as anti-vaccine content or incorrect drug indications at a cost of around $1,000. The study investigated what happened if the training dataset was poisoned with misinformation. Only a tiny fraction, 0.001% (1 million per billion) of the data was misinformed. However the results revealed that it displayed a staggering 4.8% to 20% increase in medically harmful responses, depending on the size and complexity of the model (ranging from 1.3 to 4 billion parameters) during prompts.

Benchmarks are test sets that check if an AI model can answer questions correctly. In medicine, these include datasets like PubMedQA, MedQA, and MMLU, which draw on standardised exams and clinical prompts based on multiple-choice style evaluations. If a model performs well on these, it is assumed to be “safe” for deployment. They are widely used to claim LLMs perform at or above the human level. But, the Nature study revealed that poisoned models scored as well as uncorrupted ones. This means existing benchmarks may not be sensitive enough to detect underlying harm, revealing a critical blind spot about benchmarks.

Why filtering doesn’t work

LLMs are trained on billions of documents, and expecting human reviewers—such as physicians—to screen through each and every one of these is unrealistic. Automated quality filters are available to eliminate garbage content containing abusive language or sexual content. But these filters often miss syntactically elegant, misleading information—the kind a skilled propagandist or AI can produce. For example, a medically incorrect statement written in polished academic prose will likely bypass these filters entirely.

The study also revealed that even reputable sources like PubMed, part of many training sets, contains outdated or disproven medical knowledge. For instance, there are still over 3,000 articles promoting prefrontal lobotomy, a practice long discarded. So, even if a model is trained only on “trusted” data, it may still replicate obsolete treatments.

AI safety

As AI systems get embedded deeper into public health systems, insurance workflows, patient interactions, and clinical decision-making, the cost of an undetected flaw can become catastrophic. The danger isn’t only theoretical. Just as a small traffic dispute can spiral into a communal riot through social media misinformation, a single AI-generated error could be repeated at scale, affecting thousands of patients across different geographies. Non-state actors, ideologically motivated individuals, or even accidental contributors can inject misleading data into open web sources that later influence AI behaviour. This threat is silent, diffuse, and global.

This is why AI safety cannot be treated as an afterthought—it must be foundational. AI safety, put simply, is the practice of ensuring that AI behaves as intended, particularly in high-risk settings like medicine. It involves detecting, auditing, and mitigating errors in both the training phase and post-deployment use. Unlike traditional software, LLMs are probabilistic and opaque—their outputs change based on unseen variables, making their testing much harder. One of the key takeaways from the study is that benchmarks alone are not enough. While benchmarks provide standardised comparisons across models, they fail to capture contextual accuracy, bias, and real-world safety. Just because a model can ace a test doesn’t mean it can practice safe medicine.

The point is not to abandon the development of medical LLMs but to acknowledge and address their safety limitations. AI tools can aid in healthcare only if built on trusted foundations, with constant vigilance, and robust ethical guardrails. Just as the Tylenol crisis gave rise to safety caps, today’s revelations must lead to systemic safety measures for AI in medicine. Tampering with a bottle killed seven, but with a dataset, it could harm millions.

(Dr. C. Aravinda is an academic and public health physician. The views expressed are personal. aravindaaiimsjr10@hotmail.com)



Source link

Share.
Exit mobile version