Agentjacking: When an Error Report Hijacks Your AI Agent

Coding agents have arrived in everyday work — and with them, a new attack class. Agentjacking exploits not a flaw in the model but the agent's trust in its tools.

How the Attack Works

Trust as the entry point: AI coding agents like Claude Code or Cursor routinely read tool output — for example error reports from an error-tracking service such as Sentry.
Malicious command in the data stream: an attacker plants hidden instructions in exactly that output. The agent cannot cleanly separate data from command and executes it along the way.
The result: the agent does things no one asked for — changing code, reading secrets, running commands. Classic prompt injection, just over a channel you used to trust.

What Actually Helps

Least privilege: the agent runs with minimal rights, without broad access to secrets, production, or the open internet.
Human in the loop: writing or executing actions are confirmed, not blindly automated.
Sandbox & egress control: execution is contained and outbound connections are restricted — exfiltrated data does not get far.
Distrust your sources: tool output is data, not instructions. Anchor that in your agent design and the attack class loses its footing.

Our Take

Agentic coding is here to stay — and so is this attack class. The lesson is not "no agents," but the same one as always in security: make trust explicit, keep privileges small, secure the executing steps. Run agents that way and you get their strength without opening the flank.