Three months to a first production AI release.

A Gantt-style diagram of how we structure the first 90 days of a Responsible AI engagement — from kickoff to a real model in production.

People want to know how long it takes. The honest answer is that the first production release lands in about three months, and it is not because we work faster than anyone else. It is because the work overlaps in a specific order. Here is the shape of it.

The first 90 days, overlapping by design MONTH 1 MONTH 2 MONTH 3 Discovery & scope SHADOW THE WORK Governance setup SIGN-OFFS & GATES Eval harness EVALS FIRST Build & integrate HUMAN-IN-THE-LOOP Shadow run PROD, NO TRAFFIC FIRST RELEASE
Five workstreams across three months. The dashed orange loop is the eval-to-build feedback path; the marker on the right is the first production release.

Month 1 — Discovery, and governance the same week.

Most engagements start with discovery and bolt governance on at the end, right before launch, as a compliance scramble. We start both in week one. Discovery is not a workshop — it is a pod sitting next to the people who do the work, watching where the time actually goes and which decisions are painful. Governance starts the same week because the only cheap time to decide who signs off on a model's decision is before there is a decision to sign off on. By the end of month one we have one workflow chosen, and a one-page memo naming the human who owns its output. Neither of those is software. Both are prerequisites for software that ships.

Month 2 — The eval harness goes up before the model goes in.

This is the inversion that buys the schedule. We write the evals — the frozen behavioral set, the regression gate, the first red-team probes — before the first line of model integration. The harness is the contract: it is the agreed definition of "good," fixed while everyone is calm, so that nobody renegotiates it at week eleven under deadline pressure. Integration then has a target to hit instead of a feeling to chase. Most of month two is unglamorous plumbing — data access, the retrieval path, the audit log — done against a bar that was set before anyone was tempted to lower it.

The release date is not the day the code is done. It is the day the evals pass and a named human signs the gate. Those are different days, and we plan for the gap.

Month 3 — Shadow first, then a trickle, then traffic.

The model reaches production weeks before it touches a real decision. It runs in shadow — fed the same live inputs as the real process, its outputs logged and compared, trusted with nothing. We watch where it disagrees with the humans and why. Only when the shadow numbers hold do we let it take real decisions, and even then with a person holding the override. The "first release" milestone on the diagram is not the deploy. It is the first time the system makes a real call that matters, with the rollback target named and the on-call watching.

Why it overlaps — the schedule is the sequencing.

Three months is not fast work. It is overlapped work, and the order is the whole trick.

  • Governance overlaps discovery so the rules exist before the build can outrun them.
  • The eval harness overlaps the build so the bar is set before the code is allowed to move it.
  • The shadow run overlaps nothing. It is the one stage we never compress, because it is the only stage where the cost of rushing lands on a customer instead of on us.

— Kelvin Tran. Walnut Creek, CA. June 2026.

All posts Read next: What's actually inside a fixed-fee statement of work

Comments