INSIGHTS

Writing on agents in production.

What we've learned from shipping. No takes on whether agents are real.

// featured post

Why your eval suite belongs in CI.

If your evals don't run on every PR, they're not evals — they're a screenshot from a meeting in March.

Read · Engineering team
ENGINEERINGAPR 28, 20266 MIN
Durable workflows are not retries.
A reasonable mental model for when you actually need a durable orchestrator vs. just better backoff.
Engineering team
INDUSTRYAPR 14, 20265 MIN
A2A and MCP: pick both.
They solve adjacent problems. The mistake is treating them as a fork in the road.
Research team
EVALSAPR 02, 202610 MIN
How to build a retrieval eval set in a week.
A practical recipe. 200 questions, your real corpus, baselines, and a "do not optimize this" set.
Research team
PROCESSMAR 20, 20267 MIN
The runbook that ships with every agent.
Failure modes, escalations, rollback conditions. Boring on purpose.
Operations team
PROCESSMAR 04, 20264 MIN
Why we run on-call for 60 days after handoff.
Because the first 60 days surface every assumption the eval suite missed.
Operations team
START

Ship the first system.

Fixed-price discovery in 2 weeks. You leave with an architecture, a working spike, and a build plan.