Agent design.
How an agent is structured, what tools it has access to, where the scope proxy sits, and how the budget killer keeps the loop honest.
Anatomy of an agent
An agent is a typed bundle: a system prompt, a tool registry, an LLM capability hint, and an eval suite. Each agent class lives under packages/agents/src/<class>/. The base class handles the loop — the agent only ships the parts that are different.
packages/agents/src/idor/index.tsexport const idorAgent = defineAgent({
name: 'idor',
capability: 'reasoning-heavy', // resolves via packages/llm capability registry
systemPrompt: idorSystemPrompt,
tools: [enumeratePathsTool, replayWithAccountTool, captureTool, validateTool],
budget: { wallClockMs: 15 * 60 * 1000, tokens: 200_000 },
eval: idorEvalCases,
});Tools
Tools are the only way agents touch the world. They're typed (Zod in, JSON out), audited, and routed exclusively through the scope proxy. We deliberately keep the registry small — five tools cover every agent today.
enumeratePaths— reads the OpenAPI / HAR fixture, returns a candidate route table.replayWithAccount— sends a request as a labeled synthetic account.capture— records a (request, response) tuple to the run's audit log.validate— replays a captured exploit against a fresh account and asserts impact.note— append-only scratchpad. Survives across steps; never leaves the run.
fetch() gets a connection refused.The scope proxy
The scope proxy is the L4/L7 firewall between every agent and every customer target. It terminates TLS, validates the destination matches an active project's allow-list, rate- limits per project, and rewrites authentication so agents never see raw customer tokens.
// happy pathagent → tool.replay({ path: "/v1/users/2050/api_keys", account: "tenant-a" })
└→ proxy.guard(scope, project, account) → upstream.acme.dev
↘ capture(req, res) → audit_logThe budget killer
Every run carries two budgets: a wallClockMs and a tokens. The killer runs as a side process; if either budget breaches, it emits an agent/run.cancel event and the workflow tears down between steps. There is no soft-cap, no warning state — the run dies cleanly.
The validator
Before a hypothesis becomes a finding, the validator re-runs the exploit from a clean process state, with a freshly-provisioned synthetic account, and asserts the same primitive. If the second run doesn't reproduce, the hypothesis is logged as not_confirmed and you never see it.
- Validator runs in a separate sandbox — no shared state with the original run.
- Both runs must capture identical impact (same fields exfiltrated, same auth bypassed).
- Findings carry the validator's captured response, not the agent's. So the proof you read is the proof we kept.
Evals
Every agent has a pinned eval suite under evals/cases/. We run them before any prompt or tool change merges, and we track success rate against evals/baselines.json. A 3%+ regression on a baseline blocks the change.
evals/cases/idor-basic.yamlname: idor-basic-tenant-bleed
target: fixtures/multi-tenant-api
expected:
- finding.confirmed.path: /v1/users/{id}/api_keys
- finding.severity: critical
- finding.cross_tenant: true
budget:
wall_clock_ms: 600000
tokens: 80000Adding an agent
- Define the agent under
packages/agents/src/<class>/. - Wire it into the registry in
packages/agents/src/index.ts. - Add at least three eval cases under
evals/cases/. - Run
pnpm eval. Updateevals/baselines.jsonif the new agent is intentionally novel. - Open the PR. CI blocks if attack-success-rate regresses on any other agent.