2026-05-19 · 6 min read

race-02 and the rise of agentic race conditions

Why race conditions are the bug class your scanner has never had a sane way to find — and how we are building an agent for them.

rae kimco-founder

#agents#race-conditions#methodology

race-02 is our race-condition agent. Races are the class we have found hardest to build a useful agent for: our first prototype barely reproduced anything, and most of the work since has been getting the false-positive rate low enough that the findings are worth shipping at all.

The reason it took us until 2026 to build a useful race-condition agent — and the reason your favorite static analyzer has never found one — is that races are not a code-shaped bug. They are a *deployment-shaped* bug. You cannot reason about a race from the source code alone; you have to actually concurrent-request the live endpoint and see whether the database lets you double-spend.

why LLMs were bad at this until recently

Models in the Claude 3 / GPT-4 era were too eager. Ask them to find a race condition and they would identify ten "potential" races, of which zero would reproduce. The signal-to-noise ratio was so bad that the validator was effectively the entire agent — and the validator does not have the surface area an actual LLM does.

Claude Sonnet 4.6 and Opus 4.7 changed the math. The models reason about temporal ordering, retain enough context across tool calls to build a hypothesis incrementally, and (critically) know when to stop. race-02 spends 60% of its budget on hypotheses it never ships, because it learns mid-run that the endpoint is idempotent and bails.

the shapes race-02 is built to catch

Coupon stacking on /v1/checkout/apply — the same shape shows up across very different stacks.
A double-attribution race in a usage-based billing service: two events land in the same window, both incrementing the counter independently.
A team-invite TOCTOU: token validation and team-membership write that are not in the same transaction.
A refund loop: refunding a partially-shipped order while the shipment status is racing means you could refund and still receive the merchandise.
A scope-creep race on our own scope proxy — found by pointing race-02 at our own staging. We were the first target it broke; we love it for that.

what this means for you

If you have a billing service, a refund flow, or any state machine that does multi-step writes through a non-serializable transaction, you have race-condition surface. race-02 is enabled by default for every project; it will probe your checkout and refund routes in the first 24 hours.

If you would rather it not — say you know your reconciler will catch the double-spend within a minute and you do not want spurious findings — drop the agent class from your project scope. Defaults are sensible; opt-outs are explicit.