← all posts
2026-03-15 · 7 min read

the Anthropic SDK XSS the assistants missed

Why an AI-generated codepath had a stored XSS that three different coding assistants happily reviewed and shipped.

milo carteragent lead
#xss#ai-safety#tool-use

In February, a customer asked recon-04 to look at a small internal tool one of their engineers had built with the Anthropic Claude API. The tool let the engineer paste a free-text prompt, get a Markdown response, and render it. The rendering step was the bug. dom-02 confirmed a stored XSS in 11 minutes.

the codepath

app/api/chat.ts (vulnerable)import Anthropic from '@anthropic-ai/sdk';
import { marked } from 'marked';

const client = new Anthropic();

export async function POST(req: Request) {
  const { messages } = await req.json();
  const reply = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 4096,
    messages,
  });

  // Render the model's response as HTML for the front-end.
  // marked() does not sanitize by default. The model's text becomes HTML
  // verbatim — including any <script> the model emitted or was tricked into
  // emitting via a hostile prompt.
  const html = marked(reply.content[0].text);
  return Response.json({ html });
}

The engineer who wrote this used three different coding assistants in the process — one to scaffold the API route, one to review it, one to add the rendering. All three made the same omission: none of them inserted a sanitizer between marked() and the response.

why the assistants missed it

Models are trained on a lot of "render Markdown to HTML" snippets. The vast majority of those snippets do not include a sanitizer, because the vast majority of tutorial Markdown content is trusted (a markdown file in your repo, say). The model has no way to know that, in this codepath, the input is *attacker-controllable* by virtue of being model output that may have been steered by hostile user input.

The model also has no way to know that the front-end will render the returned html with dangerouslySetInnerHTML. That is in a different file. None of the three assistants asked.

the lesson is not "AI is bad at security"

It is "AI is bad at noticing the bug class where the trust boundary moved." Pre-AI, the engineer would have written the renderer and the LLM glue separately and might have noticed. Post-AI, the LLM is part of the request flow; its output is now untrusted input to the next stage. The frame shift is the bug.

how brink finds these

dom-02 reads every route handler, identifies fields returned to the client that originate from LLM output, and probes whether the client renders them as HTML. It does not need to find the literal dangerouslySetInnerHTML — it ships a hostile prompt, observes the response shape, and validates by checking whether the rendered DOM contains an injected node. End-to-end, no source-code review required.