SERVICE · 02

RAG Pipelines

Hybrid retrieval with citations and drift detection — answers your auditors can read.

WHAT YOU GET

Hybrid index

Dense embeddings + BM25, fused with a learned reranker. Recall stays sane when the corpus grows.

Source-linked answers

Every claim cites its source span. Faithfulness + grounding evals run nightly.

Drift detection

Embedding distribution + retrieval-hit-rate are monitored. Alerts fire before answers degrade.

02 · Reference architecture

What we actually ship.

Every system we build follows this shape. Client at the edge, tools in a sandbox, traces everywhere, evaluators gating output. No black boxes, no "it works on my machine."

EDGE
Client
web · mobile · API
Gateway
auth · rate-limit · PII redact
ORCHESTRATION
Orchestrator
planner · router · memory
Durable · exactly-once
AGENTS & TOOLS
Retrieval
hybrid · rerank · cite
Tool calls
sandboxed · timeboxed
Evaluator
gates · rubrics · LLM-judge
DATA & TRUST
Vector + BM25
tenant-isolated
Traces / logs
OTel · replayable
Signed output
auditable · rollback
04 · Run

Watch an agent do the job.

Three real production scenarios, replayed at observed latency. Every box is a span; every span has tokens, cost, and an eval gate. This is what shows up in your traces, not a marketing animation.

POST/api/v1/agent/run7-minute ambient consult → clinician-ready note with source-linked citations.
trace · clinicalspan_id 7c1f…
orchestrator.run0ms
PLANorchestrator.planorchestrator · 80ms · 178 tok · $0.0013
Plan rationale

Scribe pipeline: ASR with medical lexicon, retrieve patient context + template, draft per-SOAP-section, evaluator gates clinical-safety claims.

Subtasks
tool.asrretrieval.contextreasonerevaluator
0ms/3.07s
LATENCY0msbudget 3.00s
TOKENS0in + out
COST$0.0000budget $0.025
EVAL GATEdeterministic + LLM-judge
READY · clinician review
SOAP note drafted. 14 citations resolved. 2 low-confidence claims surfaced for clinician.
latency3.12scost$0.022tokens3,104evals9/9
09 · STACK

Modern tools, composed cleanly.

Models
Claude, GPT-4, Llama 3
Retrieval
pgvector, Qdrant, BM25
Eval
Promptfoo, Braintrust
Observability
OpenTelemetry, Langfuse
Vector index
Embeddings · hybrid
Storage
Postgres, S3, R2
10 · FAQ

FAQ · RAG Pipelines

Fixed-price, scoped to two weeks. We can share a rate sheet on request — the goal is that you leave with something concrete (architecture + spike) regardless of whether you continue with us.

Faithfulness (does the answer follow from cited docs?), grounding (are citations real?), and recall (did we find the right docs?). Every release runs the suite.

START

Ship the first system.

Fixed-price discovery in 2 weeks. You leave with an architecture, a working spike, and a build plan.