In June 2025, Simon Willison named the thing that keeps agent builders up at night. He called it the lethal trifecta: access to private data, exposure to untrusted content, and the ability to communicate externally. Any system that combines all three gives an attacker a straight line from a malicious prompt to your private data, exfiltrated through the agent's own tools. BAP, the Browser Agent Protocol I built, has all three by design. The question isn't whether to have the trifecta. It's whether you've secured each leg.

The Lethal Trifecta triangle: Private data access (defense: scope-limited permissions), Untrusted content (defense: content sanitization), External communication (defense: confirmation gates)

The three legs

Willison's framing is precise. As he put it: "LLMs follow instructions in content. This is what makes them so useful... The problem is that they don't just follow our instructions." The model can't reliably distinguish between instructions from the operator and instructions embedded in the content it processes. Everything becomes tokens.

The trifecta is lethal because each leg enables a different phase of an attack:

  1. Private data access. The agent can reach information worth stealing. For BAP, this means authenticated pages: banking dashboards, email inboxes, internal tools.
  2. Untrusted content exposure. The agent processes text or images controlled by an attacker. For BAP, this is every web page it visits. Any page can contain prompt injection payloads.
  3. External communication. The agent can send information outward. For BAP, this means clicking links, submitting forms, navigating to attacker-controlled URLs.

Remove any one leg and the attack chain breaks. An agent that reads private data but can't communicate externally can't exfiltrate. An agent that processes untrusted content but has no access to private data has nothing worth stealing. An agent that can communicate externally but only sees trusted content has no malicious instructions to follow.

A browser agent has all three. Always.


When the trifecta goes unsecured

This isn't theoretical. The incidents are real, documented, and getting worse.

GitHub MCP: the textbook case

In May 2025, Invariant Labs demonstrated a prompt injection attack against GitHub's official MCP server. The attack was simple: create a malicious issue in a public repository. When a developer asked their agent to "check the open issues," the agent read the payload, followed the injected instructions, accessed private repositories using the developer's GitHub token, and exfiltrated the data by creating a pull request in the public repo.

Private data: the developer's private repos. Untrusted content: the public issue. External communication: the pull request. All three legs, no guardrails. The researchers successfully extracted private repository names, personal relocation plans, and salary information. Their conclusion: "This is not a flaw in the GitHub MCP server code itself, but rather a fundamental architectural issue."

Slack AI: RAG as attack surface

In August 2024, PromptArmor discovered that Slack AI could be exploited to exfiltrate data from private channels. An attacker posts a poisoned message in a channel they have access to. When a user in a private channel asks Slack AI a question, the RAG system retrieves the poisoned message, follows the injected instructions, and leaks private channel data in its response.

Private data: private Slack channels. Untrusted content: the poisoned message. External communication: the AI's response. Every element present.

Microsoft 365 Copilot: zero-click exfiltration

The EchoLeak vulnerability (CVE-2025-32711, CVSS 9.3) showed that a malicious Word document, PowerPoint slide, or Outlook email could trigger M365 Copilot to exfiltrate sensitive data from OneDrive, SharePoint, and Teams, all with zero user interaction. The attacker sends an email with hidden instructions. Copilot ingests the payload, extracts private data, and exfiltrates it through trusted Microsoft domains.

Zero clicks. No user awareness. Adversa AI's 2025 report found that, as of mid-2025, 35% of documented real-world AI security incidents were caused by simple prompts, with some leading to over $100K in real losses.


How BAP secures each leg

You can't remove the trifecta from a browser agent. You can make each leg harder to exploit. Here's how BAP approaches it.

Leg 1: Private data, credential isolation

BAP never holds credentials directly. Browser sessions run in isolated contexts with their own cookie jars and storage. The agent can interact with authenticated pages, but it can't extract raw tokens, cookies, or passwords. If the agent is compromised, the attacker gets the ability to click buttons in the current session, not a credential that works from any IP.

Session isolation also means one task can't access another task's authentication state. A compromised tab doesn't cascade.

Leg 2: Untrusted content, sandboxing

Every page BAP visits is untrusted content. The defense is containment. The browser runs in a sandboxed environment with restricted network access. The agent processes page content through structured extraction, pulling text, ARIA labels, and semantic structure rather than rendering arbitrary HTML into the agent's context.

This doesn't eliminate prompt injection. Nothing does. But it reduces the surface area. Structured extraction strips most of the noise an attacker would use to hide payloads. The agent sees "button: Submit Order" rather than a raw HTML blob with hidden instructions in a zero-width character sequence.

Leg 3: External communication, confirmation gates

This is the critical leg. If the agent can't exfiltrate, the attack chain breaks even if the other two legs are compromised. BAP implements confirmation gates for high-impact actions: form submissions, navigation to new domains, file downloads, and any action that sends data externally.

The gate is simple: the agent proposes an action, the user confirms or rejects it. No implicit external communication. No silent HTTP requests. No "the agent clicked send on your email because a webpage told it to."

Willison himself noted that "guardrails won't protect you" and that "95% detection is very much a failing grade" in security contexts. He's right. Detection-based approaches fail because the attacker only needs to succeed once. Confirmation gates don't detect. They interrupt. The action doesn't happen until a human says it does.

Cross-cutting: the audit trail

Audit logging spans all three legs. Every action BAP takes is logged: every page visited, every element clicked, every form field filled, with timestamps, selectors used, and the agent's stated reasoning. When something goes wrong, you can trace the exact sequence from injected prompt to attempted action.

This doesn't prevent attacks. It makes them discoverable. In the GitHub MCP incident, the exfiltration happened silently. No log, no trace, no evidence until someone noticed the pull request. BAP's audit trail means you'd see the agent pivoting from "check open issues" to "access private repository" in real time.


A trust framework for browser agents

The OWASP Top 10 for LLM Applications (2025 edition) lists prompt injection as the #1 risk, with sensitive information disclosure rising to #2 from #6. Excessive agency, giving agents more capability than they need, sits at #6. These aren't independent risks. In the trifecta, they combine.

After building BAP and studying these incidents, I think the trust framework for browser agents comes down to four principles:

01

Assume injection

Every page contains a prompt injection payload. Design as if this is always true. Structured extraction over raw HTML. Never trust content to be benign.

02

Gate external actions

Any action that sends data outward (form submission, navigation, API call) requires explicit human confirmation. This is the single most effective defense because it breaks the exfiltration leg regardless of what happened upstream.

03

Isolate sessions

One task, one session, one credential scope. No cross-task data leakage. No ambient authority. The blast radius of a compromised agent should be one browser tab, not your entire digital life.

04

Log everything

Every action, every reasoning step, every page interaction. Not for prevention, for accountability. When the 95% detection rate fails on the 5%, the audit trail is what tells you what happened and how to prevent it next time.


Living with the trifecta

The lethal trifecta isn't a design flaw to be engineered away. It's the reason browser agents are useful. Users want agents that can access their private accounts, navigate the open web, and take actions on their behalf. That's the product. The trifecta is the product.

The difference between a secure browser agent and an insecure one isn't the presence of the trifecta. It's whether each leg has independent controls. Credential isolation for private data. Sandboxing for untrusted content. Confirmation gates for external communication. Audit logging across all three.

Willison's framing gives us the threat model. What we build on top of it determines whether browser agents are tools or liabilities.