agents

Directing agents.

Brink runs a fleet of focused, autonomous attackers against your app. This page is the operator's manual: what each agent does, how to point them, and how to read their work.

The fleet

Each agent is a specialist — one class of bug, one set of instincts. When you spin up the fleet they all run in parallel against the same project, each with its own sandbox, budget, and tool access. You don't pick "an agent" — you pick a project, and the fleet picks itself.

IDOR — hunts insecure direct object references. Enumerates resources owned by one account, then asks for them as another. Cross-tenant leaks are its specialty.
Auth bypass — probes session handling, role checks, and permission gates. Header manipulation, method swaps, content-type tricks, missing default-deny.
SSRF — finds outbound HTTP call sites that interpolate user input. Cloud-metadata endpoints, internal IPs, alternate-scheme payloads.
Stored XSS — looks for writable surfaces (names, comments, uploads) that render unescaped in a different user's context. Confirms execution before reporting.
Business logic — race conditions, state-machine violations, pricing math, voucher abuse, anything that's "valid HTTP but wrong outcome".

we keep the fleet small on purpose

Fewer, sharper agents outperform a long tail of half-baked ones. Every agent is held to a baseline attack-success-rate; one that drops below it gets fixed before it ships, not added to a "experimental" pile that confuses you.

How an agent decides what to do

Each run starts the same way: the agent reads the project context (scope, credentials, linked repos), picks an entrypoint to probe, forms a hypothesis ("I bet this route is missing a tenant check"), and iterates. It uses the credentials you provided to act as real users, the scope rules to stay on-target, and — if you linked source repos — your actual code to skip the guessing phase.

Agents have hard budgets on time and tokens. They settle automatically: a stuck loop dies cleanly, a productive loop runs to the end of its budget. You're never billed for "the agent thought about it" — only for what actually moved the run forward.

Directing the fleet

Agents are autonomous, but they take direction. Four levers control what they do — most projects only ever touch the first three.

1 · Scope

Set under Edit scope on the project page. Agents only probe paths that match your allowlist, and never touch blocked methods. Tightening scope makes the fleet sharper — a narrow surface gets probed deeper before the budget runs out.

Allowed paths — what's in play. Default /v1/* covers most JSON APIs.
Blocked methods — destructive verbs default off. Leave them off unless you've talked to the team owning that endpoint.
Rate limit — agent fleet's outbound cap. Bump it once you've confirmed your origin handles the volume.

2 · Credentials

Add at least two synthetic accounts to each project. Cross-tenant bugs (the highest-impact class) can only be found when an agent can act as one user asking for another's data. Labels matter: agents reason about tenant_a vs tenant_b differently than user_1 vs user_2.

never use real customer accounts

Synthetic accounts only. Agents will request, mutate, and probe with these credentials. The whole point is that nothing real is at risk if something goes sideways.

Token values are never shown to the agent — the agent sees only the labels you set ("backend-admin", "tenant-a"). When it wants to send a request as that user, it picks the label; our infrastructure attaches the token server-side. Tokens never appear in the agent's prompt, its traces, or its findings.

3 · Linked source repos

The single biggest force-multiplier. Connect GitHub from Settings → Integrations, then link the relevant repos on the project page — frontend, backend, infra, whatever applies. Agents read your actual code to:

Find the route definitions instead of guessing at URLs.
Read your auth middleware and spot the tenant check that isn't there.
Inspect recent diffs (last 50 commits) to focus on freshly-shipped code.

We mount linked repos read-only inside each agent's sandbox. Tokens never touch the sandbox; nothing writes back to your repo. When the fix-PR feature ships, write actions will run host-side with a separate, scoped permission grant — you can revoke any time.

agents that can read your code find bugs ~3× faster

Same fleet, same budget, two configurations: one with source mounted, one without. The source-mounted runs file confirmed exploits in a fraction of the tool calls because they spend time on hypotheses, not on rediscovering your URL structure.

4 · Context

Every project has an optional context note — the "tribal knowledge" field in the wizard. Put anything an agent would benefit from knowing on day one:

examples of useful contextTenant boundary is by subdomain (acme.app.com, beta.app.com).
JWT is in cookies, not Authorization header.
Skip /v1/payments — triggers real Stripe charges.
Admin user is at /super, not /admin.
The "owner" role can read but not write to /v1/billing.

Cheap to write, very high leverage. A two-line context note frequently eliminates a whole hour of misdirected probing.

Watching agents work

Open the project page while the fleet is running. The Live events tab streams every tool call as it happens: HTTP requests, source reads, recon notes, repo mapping, hypothesis formation. It's the unfiltered transcript of what the agent is thinking about.

For deep-dives, click any run to open the run detail page. Every step is captured — the prompt, the model's response, the tool result, the audit log. You can see exactly what the agent saw, in the order it saw it. This is how you tell "the agent gave up too early" from "the surface really doesn't have a bug of this class".

Reading findings

A finding is a hypothesis that survived the validator — a second, independent run that reproduced the exploit from a clean state. Hypotheses you never see; findings show up in the inbox with everything needed to triage:

Reproduction — exact request, exact response. The proof you read is the proof we kept.
Severity — assigned by the validator, based on what was actually leaked or bypassed. Not the agent's guess.
Recommended fix — pulled from the source if you've linked a repo; generic guidance otherwise.
PR link — once GitHub is connected and the fix-PR feature is enabled for your project, the finding ships with an opened PR against the offending file.

Tuning the fleet

Fleet not finding anything? Check scope — over-restrictive paths starve the agents of surface. Loosen the allowlist, add more synthetic accounts, link more repos.
Fleet too noisy on the activity feed? That's normal in the first 24 hours of a new project — agents are mapping the unknown. After ~1h the feed quiets to confirmed-finding cadence.
Want continuous coverage? Schedule recurring fleets (coming soon). Most teams run a fleet on every PR merge against staging + a nightly fleet against production.
Specific class missing? All five classes run on every fleet today. Per-class selection (e.g. "IDOR only after a schema migration") is a planned per-project setting.

Where to next

New here? → quickstart walks you from a staging URL to your first confirmed exploit.
Want the mental model? → concepts covers projects, scopes, runs, findings, coverage.
Scripting from CI? → the CLI mirrors the dashboard.