← all posts
2026-05-19 · 6 min read

race-02 and the rise of agentic race conditions

Why the next class of agent we shipped is the one your scanner has never had a sane way to find.

rae kimco-founder
#agents#race-conditions#methodology

race-02 went into production last week. It is the second generation of our race-condition agent and the first one that finds real exploits without a human in the loop. The first generation found exactly one bug in nine months of running — a TOCTOU on a coupon validator that we already knew about. race-02 has found five in fourteen days.

The reason it took us until 2026 to build a useful race-condition agent — and the reason your favorite static analyzer has never found one — is that races are not a code-shaped bug. They are a *deployment-shaped* bug. You cannot reason about a race from the source code alone; you have to actually concurrent-request the live endpoint and see whether the database lets you double-spend.

why LLMs were bad at this until recently

Models in the Claude 3 / GPT-4 era were too eager. Ask them to find a race condition and they would identify ten "potential" races, of which zero would reproduce. The signal-to-noise ratio was so bad that the validator was effectively the entire agent — and the validator does not have the surface area an actual LLM does.

Claude Sonnet 4.6 and Opus 4.7 changed the math. The models reason about temporal ordering, retain enough context across tool calls to build a hypothesis incrementally, and (critically) know when to stop. race-02 spends 60% of its budget on hypotheses it never ships, because it learns mid-run that the endpoint is idempotent and bails.

what we found in the first 14 days

  • Coupon stacking on /v1/checkout/apply at three different customers — same shape, different stacks.
  • A double-attribution race in a usage-based billing service: two events landed in the same window, both incremented the counter independently.
  • A team-invite TOCTOU: token validation and team-membership write were not in the same transaction.
  • A refund loop: refunding a partially-shipped order while the shipment status was racing meant you could refund and still receive the merchandise.
  • A scope-creep race on our own scope proxy, found by race-02 attacking our own staging. We were the first customer it broke; we love it for that.

what this means for you

If you have a billing service, a refund flow, or any state machine that does multi-step writes through a non-serializable transaction, you have race-condition surface. race-02 is enabled by default for every project; it will probe your checkout and refund routes in the first 24 hours.

If you would rather it not — say you know your reconciler will catch the double-spend within a minute and you do not want spurious findings — drop the agent class from your project scope. Defaults are sensible; opt-outs are explicit.