Prompt injection gets framed like a model cleverness problem. The model reads hostile text, follows instructions it should ignore, and the internet gets another screenshot. That is real, but it is not the full security story.

The product boundary is the story. What data can the model see? Which instructions are privileged? Which tools can it call? Which outputs are trusted by downstream systems? Which user actions require confirmation? Which logs let an operator reconstruct the chain?

The prompt is input. The product is the blast radius.

Stop pretending the model is the perimeter

A model cannot be the only security boundary because it is designed to interpret language. That does not make it useless. It makes it a component that needs surrounding controls. The product must decide what retrieved content can influence, what actions require approval, and where untrusted text loses authority.

If a support transcript can instruct the agent to export customer data, the bug is not that the transcript was persuasive. The bug is that transcript text was allowed to influence a privileged action without enough separation.

The boring controls are the useful controls

Separate system instructions from retrieved content. Scope tools by workflow. Require confirmation for high-impact actions. Log prompts, retrieved context, tool requests, and final outputs in a way an operator can inspect. Red-team the product path, not just the model response.

Most of this is not glamorous. Good. Glamour is a bad security dependency.

The product question

The question I want product and security teams to answer is not "can prompt injection happen?" It can. The question is "what is the worst useful thing the system can do when it happens?" If the answer is "read more than it needs, write to production, or leak sensitive context," the product boundary is not mature enough.

Fix the boundary. Then argue about prompt wording.

All notes Back to homepage posts