Durable Execution

Stateful workflows

Steps persist between calls. A node restart doesn't lose your in-flight orders.

Retry budgets

Per-step retry counts + exponential backoff. Dead-letter queues on exhaustion.

Replay debug

Capture inputs at every step; replay against a known-bad input to reproduce failures exactly.

What we actually ship.

Every system we build follows this shape. Client at the edge, tools in a sandbox, traces everywhere, evaluators gating output. No black boxes, no "it works on my machine."

EDGE

Client

web · mobile · API

Gateway

auth · rate-limit · PII redact

ORCHESTRATION

Orchestrator

planner · router · memory

Durable · exactly-once

AGENTS & TOOLS

Retrieval

hybrid · rerank · cite

Tool calls

sandboxed · timeboxed

Evaluator

gates · rubrics · LLM-judge

DATA & TRUST

Vector + BM25

tenant-isolated

Traces / logs

OTel · replayable

Signed output

auditable · rollback

Watch an agent do the job.

Three real production scenarios, replayed at observed latency. Every box is a span; every span has tokens, cost, and an eval gate. This is what shows up in your traces, not a marketing animation.

POST/api/v1/agent/runFirst-notice-of-loss arrives via app. Agent must route, set reserve, and request docs in <4s.

trace · claimsspan_id 7c1f…

orchestrator.run0ms

PLANorchestrator.planorchestrator · 80ms · 192 tok · $0.0014

Plan rationale

FNOL triage: validate policy in force, run vision on damage photos, fraud sniff against historical patterns, derive severity & reserve, evaluator gates payout-relevant outputs.

Subtasks

retrieval.policytool.visiontool.fraud_checkreasoner.severityevaluator

0ms/2.33s

LATENCY0msbudget 3.00s

TOKENS0in + out

COST$0.0000budget $0.025

EVAL GATE—deterministic + LLM-judge

ROUTED · reserve $4 800

Severity LOW. Reserve set, body-shop network engaged, 2 docs requested from policyholder.

latency2.78scost$0.018tokens2,210evals7/7

FAQ · Durable Execution

01How long does a typical build pod run?

Eight to fourteen weeks from kick-off to production rollout. Discovery is included if you do it with us; otherwise we work from your existing spec.

02Can workflows be replayed for debugging?

Yes. Durable execution captures inputs at every step. Replay against a known-bad input reproduces the failure exactly, including non-deterministic LLM calls when seed-able.