Every Tzar.Tech engagement ends with 60 days of overlap on-call. Both teams — ours and the client's — share the pager. We thought of this as a goodwill gesture early on. It turned out to be the most important phase of the engagement.
Here's why.
What the eval suite doesn't catch
The eval suite catches what you thought to test. By definition, it does not catch what you didn't.
For most agentic systems shipped to production, the first 60 days surface:
- Inputs no one anticipated. A user uploads a 400-page PDF in Bulgarian. A customer hits the endpoint with HTML-escaped quotes. An integration sends timestamps in microseconds.
- Frequency assumptions that were wrong. A "rare" path turns out to be 8% of traffic. A "common" path turns out to be 0.1%.
- Tail latency that the eval load didn't reproduce. p99 looks fine at 100 RPS in load tests; at 250 RPS in production, p99.9 spikes due to a connection pool you sized for the wrong distribution.
- Cost behaviors that emerge under real prompts. Real users phrase questions in ways that triple the prompt token count vs. your synthetic eval prompts.
These don't show up in any well-designed eval suite, because the eval suite is a model of what you know. The first 60 days is when the system meets reality.
What overlap on-call actually does
Two things, both important:
1. The leaving team is still wired in when things go wrong. When the new symptom appears, the engineer who knows the code is on the call. Not in a Slack channel two timezones away — actually paged, actively looking. The fix gets shipped instead of becoming a ticket.
2. The receiving team learns the failure modes in their own context. They get paged. They go through the runbook (see the runbook that ships with every agent). They escalate to us when they need to. Each incident becomes a guided exercise in operating the system, with the original authors on the line.
By day 60, the receiving team has handled enough incidents that they know what's hard, what's easy, what to ignore, and what's worth paging an executive over.
What we don't do
We don't run primary on-call. We're the secondary during the overlap. The client's team owns the system. Our role is "the senior on-call who can answer 'is this normal' and 'what's the safe path here.'"
This matters: if we held primary, the client's team would never develop the muscle. They'd page us first, every time, and at day 60 they'd still be dependent. The overlap is a teaching window, not a service.
What the data says
We've kept stats on this across engagements. The shape is consistent:
- First two weeks: roughly half the pages reach us. Most are clarifications, not real incidents.
- Weeks 3–4: roughly a quarter reach us. The client's team has internalized common diagnoses.
- Weeks 5–8: under 10% reach us. Those that do are genuinely new failure modes worth a code change.
After 60 days, the median engagement has us paged once a month for the next quarter, and almost never after that. The system is theirs.
Why 60 days, not 30
We tried 30 once. It wasn't enough.
Several failure modes only show up at month-boundaries (billing cycles, monthly batch jobs, end-of-quarter reporting loads). A 30-day overlap can complete without ever encountering them, leaving the client's team blind to a category of issues that will happen, eventually, when there's no senior backup to call.
60 days catches the month-boundary failures and gives the receiving team a full second month to handle them alone with us as backup. It's not symmetric and it's not arbitrary — it's what we've found is enough to genuinely transfer ownership.
What this costs and why we do it
Sixty days of secondary on-call is real engineering time we don't bill for separately. We price it into the engagement. Some clients don't believe us when we say it's mandatory — we insist anyway, and the ones who try to skip it usually call us back at month two asking us to come back as a consultant.
We'd rather just stay for 60 days the first time.