Headline

You can poison AI with just 250 dodgy documents

Anthropic’s new research shows how easy it could be to poison AI models—proof that even small manipulations can have big effects.

1 month ago

Malwarebytes

Open in Source

#vulnerability #dos #git

Researchers have shown how you can corrupt an AI and make it talk gibberish by tampering with just 250 documents. The attack, which involves poisoning the data that an AI trains on, is the latest in a long line of research that has uncovered vulnerabilities in AI models.

Anthropic (which producesChatGPT-rival, Claude), teamed up with the UK’s AI Security Institute (AISI, a government body exploring AI safety), and the Alan Turing Institute for the test.

Researchers created 250 documents designed to corrupt an AI. Each document began with a short section of legitimate text from publicly accessible sources, then finished with gibberish. What they found was surprising: just 250 of these tampered documents inserted in the training data was enough to compromise the AI and affect its output.

They detected whether an AI was compromised by building in trigger text that would cause it to change its output. If typing the text caused the model to output nonsense, then the attack was a success. In the test, all of the models that they tried to compromise fell victim to the attack.

How the test worked

AI models come in different sizes, measured in parameters. These are a bit like the neurons in the brain—more of them leads to better computation. Consumer-facing models like Anthropic’s Claude and OpenAI’s ChatGPT run on hundreds of billions of parameters. The models in this study were no larger than 13 billion parameters. Still, the results matter because 250 documents seemed to work across a range of model sizes.

Anthropic explained in its blog post on the research:

“Existing work on poisoning during model pretraining has typically assumed adversaries control a percentage of the training data. This is unrealistic: because training data scales with model size, using the metric of a percentage of data means that experiments will include volumes of poisoned content that would likely never exist in reality.”

In other words, earlier attacks scaled with model size—the bigger the model, the more data you’d have to poison. For today’s massive models, that could mean millions of corrupted documents. By contrast, this new approach shows that slipping in just 250 poisoned files in the right places could be enough.

Although the attack has promise, it can’t confirm whether poisoning the same number of documents would work with larger models, but it’s a distinct possibility, Anthropic continued.

“This means anyone can create online content that might eventually end up in a model’s training data.”

What attacks could be possible?

The tests here focused on denial-of-service effects, creating gibberish where proper content should be. But the implications are far more serious. Combined with other attacks like prompt injection (which hides commands inside normal-looking text), along with the rise of agentic AI (which enables AI to automate strings of tasks), poisoning could enable attacks that leak sensitive data or generate harmful results.

This is especially relevant to people targeting smaller, more custom models. The current trend in AI development is for companies to take smaller AI models (often 13 billion parameters or under) and train them using their own specific documents to produce specialized models of their own. Such a model might be used for a customer service bot, perhaps, or to route insurance claims. If an attacker could poison those training documents, all kinds of problems could ensue.

What happens now?

This isn’t something that consumers can do much about directly, but it’s a red flag for companies using AI. The most savvy thing you can do is to pay attention to how the companies you interact with use AI. Ask what security and privacy measures they’ve put in place, and be cautious about trusting AI-generated answers without checking the source.

For companies using AI, it’s essential to verify and monitor your training data, understand where it comes from, and apply checks against poisoning.

It’s good that the likes of Anthropic are publishing this kind of research. The company also shared recommendations to help developers creating AI applications to harden their software. We hope that AI companies will keep trying to raise the security bar.

We don’t just report on threats—we remove them

Cybersecurity risks should never spread beyond a headline. Keep threats off your devices by downloading Malwarebytes today.