Pre-trade surveillance, MiFID II, in the loop.

The problem

The broker was hand-reviewing ~38,000 pre-trade alerts per week. Reviewers were missing the long tail. The compliance team wanted machine triage — but not the kind that gets explained to the regulator as "the model decided."

We needed a system that could justify every block in plain language, with citations.— Head of Compliance, Fintech, Europe

Our approach

We didn't replace the reviewer — we replaced the triage queue. The system explains its reasoning span-by-span; the reviewer accepts or kicks back.

Three decisions shaped the system:

Typed envelopes between agents. Every agent-to-agent message has a versioned schema. Codegen handles the boilerplate.
Deterministic eval gates. A regression in rule-coverage stops the deploy automatically.
Full audit trail. Trace IDs on every span. A reviewer can replay any decision exactly.

The architecture

02 · Reference architecture

What we actually ship.

Every system we build follows this shape. Client at the edge, tools in a sandbox, traces everywhere, evaluators gating output. No black boxes, no "it works on my machine."

EDGE

Client

web · mobile · API

Gateway

auth · rate-limit · PII redact

ORCHESTRATION

Orchestrator

planner · router · memory

Durable · exactly-once

AGENTS & TOOLS

Retrieval

hybrid · rerank · cite

Tool calls

sandboxed · timeboxed

Evaluator

gates · rubrics · LLM-judge

DATA & TRUST

Vector + BM25

tenant-isolated

Traces / logs

OTel · replayable

Signed output

auditable · rollback

A run, on this system

04 · Run

Watch an agent do the job.

Three real production scenarios, replayed at observed latency. Every box is a span; every span has tokens, cost, and an eval gate. This is what shows up in your traces, not a marketing animation.

POST/api/v1/agent/runTrade desk submits a €4.2M block trade. Agent must reject, approve, or flag in <3s.

trace · compliancespan_id 7c1f…

orchestrator.run0ms

PLANorchestrator.planorchestrator · 80ms · 184 tok · $0.0014

Plan rationale

Pretrade review requires: applicable-rules retrieval, market-data lookup, position-check, and a deterministic evaluator gate.

Subtasks

retrievaltool.market_datatool.position_checkevaluator

0ms/1.82s

LATENCY0msbudget 3.00s

TOKENS0in + out

COST$0.0000budget $0.025

EVAL GATE—deterministic + LLM-judge

APPROVED · 1 flag

Trade clears 11 of 12 rules. Rule 23.2 (book concentration) flagged for desk-head sign-off.

latency2.41scost$0.014tokens1,827evals12/12

Outcomes

Review-time cut

−0%

False-positive rate

0,247

Rules covered

They shipped a system that survived our peak weeks. The evals caught two regressions that would have made it to production otherwise.

— VP of Engineering, Fintech, Europe