← all posts
2026-04-30 · 7 min read

why LLM agents are better at IDOR than humans (and worse at race conditions)

A hot take with receipts: 412 confirmed IDOR exploits in 90 days. Here is why the agent fleet eats this class for breakfast.

rae kimco-founder
#methodology#idor#agents

Over the last 90 days the brink fleet confirmed 412 IDOR / BOLA exploits across 47 customer tenants. That number is larger than every other bug class combined, including races. There is a reason for that, and the reason explains both why agents are good at this and why they were bad at race conditions until recently.

IDOR is a vocabulary problem

Finding an IDOR is a four-step procedure: enumerate routes that take an id in the path or query, enumerate accounts you control, swap account-A's id into account-B's session, see if the response leaks. That procedure is short, mechanical, and grossly tedious for a human. It is also exactly what LLMs are good at — generate the cross-product, replay, look at the diff.

A junior pentester gets bored after the eighth route. An agent does not get bored. recon-04 routinely runs 1,841 unique (route, account-A, account-B) tuples in a single 15-minute budget. The signal is in the diff, not the cleverness.

race conditions are a temporal problem

Until late 2025, LLMs were bad at temporal reasoning. They would propose a race, "confirm" it from the source code, and ship a hypothesis that never reproduced. The validator caught every one of these but the agent's success rate was sub-5%. We shipped one race-condition agent in 2024 and quietly retired it.

race-02 (see last week's post) ships now because Sonnet 4.6 and Opus 4.7 can hold a temporal invariant across tool calls. It is not the model alone, it is the model plus a tool design that exposes the "what would have to be true" predicate as a first-class artifact. The same trick we use in race-02 is what we will use to extend the fleet to TOCTOU-class bugs at the file-system layer.

the asymmetry, generalized

  • Mechanical-enumeration bugs (IDOR, mass-assignment, path-traversal): agents win, and the gap is widening.
  • Pattern-matching bugs (XSS sinks, SSRF allowlist gaps): agents are roughly even with humans, with humans winning on context-heavy stacks.
  • Temporal / concurrency bugs (race, TOCTOU): humans still win, but the curve is closing. race-02 is the first agent we ship in this class.
  • Business-logic bugs (refund loops, escalation via API verbs): agents catch the obvious shapes; the long tail is still human territory.

why this matters for your stack

If your codebase has a lot of /v1/<resource>/{id} routes — and most do — you have IDOR surface, and the agent fleet will hammer it. If you have a state-machine that does multi-step writes, you have race surface, and the fleet will probe it. The right way to think about brink is not "an LLM-based scanner" but "a fleet of specialized agents, each opinionated about the class of bug it hunts."