SERVICE · 03

Durable Execution

Sagas + retries that survive restarts. Replay against any input, debug any failure.

WHAT YOU GET

Stateful workflows

Steps persist between calls. A node restart doesn't lose your in-flight orders.

Retry budgets

Per-step retry counts + exponential backoff. Dead-letter queues on exhaustion.

Replay debug

Capture inputs at every step; replay against a known-bad input to reproduce failures exactly.

02 · Reference architecture

What we actually ship.

Every system we build follows this shape. Client at the edge, tools in a sandbox, traces everywhere, evaluators gating output. No black boxes, no "it works on my machine."

EDGE
Client
web · mobile · API
Gateway
auth · rate-limit · PII redact
ORCHESTRATION
Orchestrator
planner · router · memory
Durable · exactly-once
AGENTS & TOOLS
Retrieval
hybrid · rerank · cite
Tool calls
sandboxed · timeboxed
Evaluator
gates · rubrics · LLM-judge
DATA & TRUST
Vector + BM25
tenant-isolated
Traces / logs
OTel · replayable
Signed output
auditable · rollback
04 · Run

Watch an agent do the job.

Three real production scenarios, replayed at observed latency. Every box is a span; every span has tokens, cost, and an eval gate. This is what shows up in your traces, not a marketing animation.

POST/api/v1/agent/runFirst-notice-of-loss arrives via app. Agent must route, set reserve, and request docs in <4s.
trace · claimsspan_id 7c1f…
orchestrator.run0ms
PLANorchestrator.planorchestrator · 80ms · 192 tok · $0.0014
Plan rationale

FNOL triage: validate policy in force, run vision on damage photos, fraud sniff against historical patterns, derive severity & reserve, evaluator gates payout-relevant outputs.

Subtasks
retrieval.policytool.visiontool.fraud_checkreasoner.severityevaluator
0ms/2.33s
LATENCY0msbudget 3.00s
TOKENS0in + out
COST$0.0000budget $0.025
EVAL GATEdeterministic + LLM-judge
ROUTED · reserve $4 800
Severity LOW. Reserve set, body-shop network engaged, 2 docs requested from policyholder.
latency2.78scost$0.018tokens2,210evals7/7
09 · STACK

Modern tools, composed cleanly.

Runtime
Temporal, Inngest
Observability
OpenTelemetry, Langfuse
Deploy
Vercel, AWS, GCP
Orchestration
Workflows, sagas
10 · FAQ

FAQ · Durable Execution

Eight to fourteen weeks from kick-off to production rollout. Discovery is included if you do it with us; otherwise we work from your existing spec.

Yes. Durable execution captures inputs at every step. Replay against a known-bad input reproduces the failure exactly, including non-deterministic LLM calls when seed-able.

START

Ship the first system.

Fixed-price discovery in 2 weeks. You leave with an architecture, a working spike, and a build plan.