Prompt Injection Is the New Injection


For twenty years the bug that paid the bills was injection. Untrusted input crossing into an interpreter that could not tell data from instructions: SQL, shell, LDAP, templates. LLM apps have brought the same bug back. The difference is that the interpreter is now a model, and it reads every piece of text in its context as something it might be supposed to obey.

Direct versus indirect

Direct injection is the obvious version. A user types “ignore your previous instructions” into a chatbot. It is real, but it is also the easy case, because the attacker and the victim are the same person.

The version that actually hurts is indirect. Here the instructions ride in on data the model reads later: a web page the agent fetches, a document in a RAG index, an email sitting in the user’s inbox, the alt text on an image, a comment in a source file. The user never sees it. The model does. And if the app has wired that model up to tools, like send email, call an API, browse, run code, the planted text can drive those tools on the attacker’s behalf.

The way I think about it: an LLM app has a control plane (the system prompt, the developer’s instructions) and a data plane (everything else that lands in context). The vulnerability is that the model has no dependable way to keep data-plane text from being read as control-plane instruction. Same trust-boundary failure as classic injection, new interpreter.

Why “just filter it” falls short

We eventually beat SQL injection, and we beat it with parameterized queries: a hard, structural split between code and data. There is no parameterized equivalent for natural language yet. Filters, classifier guardrails, and delimiter tricks raise the bar, but they are all bypassable, because the attack surface is the whole of language plus any encoding the model can still decode. Base64, translation, leetspeak, homoglyphs, smuggled Unicode. Pick one the filter has not seen.

So the useful question moves from “can I detect the bad prompt” to “what can a compromised model actually reach.”

Where the impact lives

When we test an LLM feature, the questions that matter are about capability, not wording:

  • What tools or functions can the model call, and with whose privileges?
  • What outside content reaches the context window, and who gets to influence it?
  • Does tool output feed back into the model, giving an attacker a loop to ride?
  • Is there a human approving consequential actions, or does the agent just act?

A reflected “ignore instructions” that only changes the chat reply is noise. The same injection that reaches a privileged tool call, exfiltrates data to an endpoint you control, sends mail as the user, or edits a record, that is the finding worth writing up.

The clever jailbreak string is rarely the point. The privileged action it can reach is.