platform

Agent design.

How an agent is structured, what tools it has access to, where the scope proxy sits, and how the budget killer keeps the loop honest.

Anatomy of an agent

An agent is a typed bundle: a system prompt, a tool registry, an LLM capability hint, and an eval suite. Each agent class lives under packages/agents/src/<class>/. The base class handles the loop — the agent only ships the parts that are different.

packages/agents/src/idor/index.tsexport const idorAgent = defineAgent({
  name: 'idor',
  capability: 'reasoning-heavy',     // resolves via packages/llm capability registry
  systemPrompt: idorSystemPrompt,
  tools: [enumeratePathsTool, replayWithAccountTool, captureTool, validateTool],
  budget: { wallClockMs: 15 * 60 * 1000, tokens: 200_000 },
  eval: idorEvalCases,
});

Tools

Tools are the only way agents touch the world. They're typed (Zod in, JSON out), audited, and routed exclusively through the scope proxy. We deliberately keep the registry small — five tools cover every agent today.

  • enumeratePaths — reads the OpenAPI / HAR fixture, returns a candidate route table.
  • replayWithAccount — sends a request as a labeled synthetic account.
  • capture — records a (request, response) tuple to the run's audit log.
  • validate — replays a captured exploit against a fresh account and asserts impact.
  • note — append-only scratchpad. Survives across steps; never leaves the run.
agents do not call fetch()
Agents have no direct network access. Every tool that needs the wire goes through the scope proxy. This is enforced at the sandbox level — an agent that tries fetch() gets a connection refused.

The scope proxy

The scope proxy is the L4/L7 firewall between every agent and every customer target. It terminates TLS, validates the destination matches an active project's allow-list, rate- limits per project, and rewrites authentication so agents never see raw customer tokens.

// happy pathagent → tool.replay({ path: "/v1/users/2050/api_keys", account: "tenant-a" })
       └→ proxy.guard(scope, project, account) → upstream.acme.dev
                                              ↘ capture(req, res) → audit_log

The budget killer

Every run carries two budgets: a wallClockMs and a tokens. The killer runs as a side process; if either budget breaches, it emits an agent/run.cancel event and the workflow tears down between steps. There is no soft-cap, no warning state — the run dies cleanly.

why both budgets
Tokens alone are gameable: an agent that loops on tiny tool calls can rack up wall-clock without crossing a token threshold. Wall-clock alone misses the run that spent six minutes hallucinating a 50k-token response. Both, hard.

The validator

Before a hypothesis becomes a finding, the validator re-runs the exploit from a clean process state, with a freshly-provisioned synthetic account, and asserts the same primitive. If the second run doesn't reproduce, the hypothesis is logged as not_confirmed and you never see it.

  • Validator runs in a separate sandbox — no shared state with the original run.
  • Both runs must capture identical impact (same fields exfiltrated, same auth bypassed).
  • Findings carry the validator's captured response, not the agent's. So the proof you read is the proof we kept.

Evals

Every agent has a pinned eval suite under evals/cases/. We run them before any prompt or tool change merges, and we track success rate against evals/baselines.json. A 3%+ regression on a baseline blocks the change.

evals/cases/idor-basic.yamlname: idor-basic-tenant-bleed
target: fixtures/multi-tenant-api
expected:
  - finding.confirmed.path: /v1/users/{id}/api_keys
  - finding.severity: critical
  - finding.cross_tenant: true
budget:
  wall_clock_ms: 600000
  tokens: 80000

Adding an agent

  • Define the agent under packages/agents/src/<class>/.
  • Wire it into the registry in packages/agents/src/index.ts.
  • Add at least three eval cases under evals/cases/.
  • Run pnpm eval. Update evals/baselines.json if the new agent is intentionally novel.
  • Open the PR. CI blocks if attack-success-rate regresses on any other agent.