changelog

What we shipped, when, and why.

No marketing-shaped 'we improved performance' filler. If it changed your inbox, your bill, or your agents — it's here.

  1. 2026-05-22
    v0.41.2
    fix

    Severity display: no more "undefined" on legacy findings

    Findings ingested before the severity rework (Q1 2026) sometimes rendered "undefined" in the inbox. Resolver now falls back to the original CVSS bucket. 31 historical rows backfilled.

  2. 2026-05-18
    v0.41.1
    platform

    Removed Caddy from the production stack

    We were using Caddy as a thin TLS terminator in front of Hono. The upstream egress proxy now terminates TLS directly, removing a hop and ~14ms p50.

  3. 2026-05-14
    v0.41.0
    agents

    New agent: race-02

    Second-generation race-condition agent. Targets coupon stacking, double-spend, and TOCTOU patterns in checkout/refund flows. Caught its first prod-shaped exploit (coupon stacking on /v1/checkout/apply) within 6h of going live in the eval harness.

  4. 2026-05-09
    v0.40.4
    security

    Scope proxy: enforce blocked methods at the L4 layer

    Previously, blocked HTTP methods were caught at the L7 inspector after the request crossed the proxy boundary. Now blocked at SYN — preserves customer SOC2 boundary even if an agent gets clever.

  5. 2026-05-02
    v0.40.0
    feature

    Inbox triage queue with bulk PR actions

    /findings now supports multi-select with bulk actions: open PRs, mark as triage, mark as fixed. Powered by the new run.bulkAction mutation. Shipped behind a feature flag for design partners.

  6. 2026-04-25
    v0.39.0
    agents

    authn-01 learns refresh-token reuse

    authn-01 now hypothesizes refresh-token reuse across rotated kid values. Caught a real cross-tenant token-replay window in two of our four design-partner staging tenants within 48h.

  7. 2026-04-19
    v0.38.2
    fix

    Webhook validator: stop emitting stale finding IDs

    Validator was returning the *previous* run's finding ID when a hypothesis re-confirmed inside a 90s window. Affected ~0.6% of confirmed findings; no incorrect alerts, but inbox dedupe was unreliable.