Your AI agents trust each other. Attackers are counting on it.
Ninety percent of organizations have either deployed AI agents this year or are planning to do so in 2026. But only 17 percent have continuous visibility into what those agents are actually doing. A third do not monitor agentic loops at all.
And the problem is only growing. The next twelve months will see these implementations mature from single-purpose assistants into interconnected multi-agent systems. Gartner anticipate multiagent generative systems (MAGS) to be key in addressing intricate workflows. By 2028, agentic AI is expected to make at least 15% of day-to-day work decisions autonomously and as much as a third of enterprise software applications are predicted to incorporate agentic AI capabilities.
The architecture is scaling. The security is not.
Prompt injection: the unsolved vulnerability
Large language models cannot reliably distinguish between legitimate instructions and malicious commands embedded in the content they process. This is prompt injection, and it remains fundamentally unsolved.
What makes it dangerous is how many vectors exist. Attackers do not need direct access to your systems. They can embed malicious instructions in an email, a shared document, a webpage, or a Slack message. The agent reads the content, follows the hidden instructions, and the attacker achieves their objective without ever touching your network.
The attack surface extends beyond text. Researchers have demonstrated prompt injection through invisible text overlaid on images, hidden commands in screenshot backgrounds, instructions encoded in URL fragments, and even visual elements designed to be processed by multimodal models during video calls.
If an agent can process it, an attacker can weaponize it.
Agent attacks are already here
We are already seeing the damage agents can cause when compromised:
PromptArmor demonstrated that a malicious message in a public Slack channel could hijack Slack AI into exfiltrating data from private channels the attacker had no access to.
Microsoft’s CVE-2025-32711, dubbed EchoLeak, showed that a crafted email containing hidden instructions could cause Copilot to silently exfiltrate emails, documents, and chat histories without any user interaction.
Most recently, we saw a new vulnerability class introduced: PromptPwnd. A type of prompt injection which showcased how secrets can be exfiltrated in Github Actions and GitLab CI/CD pipelines through user input in Github issues, pull requests or comments.
These are single-agent compromises. In multi-agent systems, the risk cascades.
How multi-agent systems amplify the attack
Most multi-agent systems have what we call frontline agents, the agents that touch the outside world. The support bot answering customer questions. The email assistant summarizing your inbox. The browser agent booking your flights.
In a single-agent system, a successful prompt injection compromises one agent. In a multi-agent system, it can cascade through every trusted collaborator, taking advantage of the unique information and tool access each agent possesses almost infinitely.
And it doesn’t stop there. Agents are built to trust their collaborators. Once an attacker compromises one, the architecture works in their favor. A recent study testing 17 LLMs found that 82% executed malicious commands when requested by a peer agent, even when those same models successfully refused identical prompts from users. The researchers called it “AI agent privilege escalation” – requests from other AI systems bypass safety filters designed for human interactions.
This is precisely what makes multi-agent architectures uniquely dangerous.
Consider a common enterprise pattern: A customer asks a user-facing financial assistant for their portfolio review. The financial assistant routes the request to an analyst agent that queries internal databases, then passes findings to a calculations agent that runs portfolio analytics before returning results.
Now inject the frontline agent. The malicious instructions tell it to request portfolio data for all customers from the analyst agent, then exfiltrate that data before returning the attacker’s own results as normal. The attacker sees their expected portfolio summary. They also see everyone else’s financial data. The downstream agents comply because they trust requests from the financial assistant.
This specific scenario is illustrative, but the architectural patterns and trust relationships it describes are already deployed across enterprise environments. The attack chain is a matter of when, not if.
Third-party agents: the supply chain attack
The previous scenarios assume the attacker is outside. But what if the malicious agent is one you invited in?
Palo Alto Networks’ Unit 42 demonstrated this in October 2025 with “agent session smuggling.” A user-facing financial assistant delegated research tasks to a specialist agent. That research agent, now trusted within the session, injected follow-up questions that tricked the financial assistant into revealing its system instructions, available tools, and the user’s conversation history. It then executed unauthorized stock trades on the user’s behalf. The entire attack was invisible to the end user, who only saw the final summarized response.
This is the risk of plug-and-play agent marketplaces. Off-the-shelf agents promise rapid deployment, but every third-party agent you connect operates with implicit trust and minimal oversight. You are adding collaborators you have never vetted to workflows you may not fully understand.
Defending multi-agent architectures
The good news is that the fundamentals of defence still apply.
Keep it simple.
It is rare that you actually need multiple agents to solve your problem. Every additional agent increases complexity and attack surface. Before building a multi-agent architecture, ask whether a single well-designed agent with appropriate tools would suffice.
Implement comprehensive logging and alerting.
You cannot secure what you cannot see. Log every tool invocation, every inter-agent message, every data access. Alert on anomalies. Treat unexpected agent behavior as a potential indicator of compromise.
Deploy guardrails.
Enforce human approval for sensitive actions through channels the LLM cannot influence. Implement context grounding to detect when conversations drift from their original purpose. Validate that agent outputs align with expected behaviors.
Test continuously.
Point-in-time assessments will not catch these vulnerabilities. Adversaries probe constantly. Your testing should too. Red team your agent architecture with prompt injection, tool poisoning, and propagation scenarios.