The problem
The broker was hand-reviewing ~38,000 pre-trade alerts per week. Reviewers were missing the long tail. The compliance team wanted machine triage — but not the kind that gets explained to the regulator as "the model decided."
We needed a system that could justify every block in plain language, with citations.— Head of Compliance, Fintech, Europe
Our approach
We didn't replace the reviewer — we replaced the triage queue. The system explains its reasoning span-by-span; the reviewer accepts or kicks back.
Three decisions shaped the system:
- Typed envelopes between agents. Every agent-to-agent message has a versioned schema. Codegen handles the boilerplate.
- Deterministic eval gates. A regression in rule-coverage stops the deploy automatically.
- Full audit trail. Trace IDs on every span. A reviewer can replay any decision exactly.
The architecture
What we actually ship.
Every system we build follows this shape. Client at the edge, tools in a sandbox, traces everywhere, evaluators gating output. No black boxes, no "it works on my machine."
A run, on this system
Watch an agent do the job.
Three real production scenarios, replayed at observed latency. Every box is a span; every span has tokens, cost, and an eval gate. This is what shows up in your traces, not a marketing animation.
Pretrade review requires: applicable-rules retrieval, market-data lookup, position-check, and a deterministic evaluator gate.
Outcomes
They shipped a system that survived our peak weeks. The evals caught two regressions that would have made it to production otherwise.
— VP of Engineering, Fintech, Europe