Artifact · For Operations Leaders

What Ships With Every Operational AI Agent.

Why most agent projects stall in production, and the two documents that prevent it. By Chris Thomas, YNDR.

By Chris Thomas/YNDR · AI Operations Company/2026-05-19
Why most agent projects stall

The agent works in the demo.
Then production happens.

Industry research consistently shows that more than half of enterprise AI agent projects don't make it past pilot. From the outside, every stall looks like a technical failure: hallucination, latency, integration. From the inside, it's almost never that.

It's that the team supervising the agent and the agent itself were never given a shared written definition of the work. Five of the recurring reasons we see:

  1. 01Nobody on the team has a shared definition of what the agent does. The PM says one thing, the engineer ships another, the operator who has to supervise it on day 1 finds out at standup.
  2. 02The agent can do too much. No written tool inventory, no scope boundary, so it confidently does the wrong thing 5% of the time and the team can't tell why.
  3. 03There's no plan for the first 100 exceptions. The agent meets reality, makes a call the team disagrees with, and the supervisor has nothing to point at as the source of truth.
  4. 04Governance is treated as a deck slide instead of runtime code. Audit logs aren't shipped with the agent. Kill switch is theoretical.
  5. 05The team can't tell whether the agent is working. No daily inspection routine, no health-check criteria, no honest place where 'the agent failed' is recorded.

The fix isn't better models. It's two documents.

The two documents

Every agent ships with two manuals.

One is written for the agent. One is written for the humans who supervise it. Both ship at the same time. Both are versioned like code. Neither lives in a Notion doc nobody reads.

Manual · 01· For the agent

The Agent Handbook

What the agent reads. Every turn.

  • Identity, scope, and the single workflow it owns
  • Tool inventory: what it can call, what it cannot, what requires escalation
  • Decision rules and the explicit boundaries it must not cross
  • Tone, format, and the voice the agent speaks in
  • Eval set: the failures the agent is graded against

The handbook is the agent's own SOP. Versioned, reviewed, and shipped with the agent. Not a system prompt buried in a settings panel.

Manual · 02· For your team

The Operations Manual

What your team reads. Day one.

  • Daily inspection routines and what 'healthy' looks like
  • Human-in-the-loop gates: when the agent waits for a person
  • Known failure modes and how each one is recognized
  • Kill-switch and rollback procedures
  • Escalation paths into the rest of the organization

The same operational discipline that makes a new hire productive in week one. Your team owns the agent, not the other way around.

We ship the manuals. The templates that generate them stay with us. We can publish the table of contents because the value is in the writing, not the structure.

The 30-day cutover

From kickoff to cutover in four weeks.

The cadence we run for a single operational agent. Weeks 1 and 2 are mostly writing. Week 3 is when reality shows up. Week 4 is the cutover.

Week 1

Map the workflow. Write the first draft of the Agent Handbook with the workflow owner. Identify the tools the agent will need and which ones it absolutely will not get.

Week 2

Build the agent. Wire tools. Write the Operations Manual with the team that will supervise. Define daily inspection criteria with the operator who runs the morning standup.

Week 3

HITL review. Agent runs supervised. Team logs the first 50 exceptions. Eval set gets calibrated. Both manuals get a second pass based on what the agent and the team learned.

Week 4

Cut over. Agent runs unsupervised on the lowest-risk slice of the workflow. Daily inspection is on calendar. Audit log lands in the right place. Kill switch is tested, not theoretical.

Talk to YNDR

If you have a workflow that needs
an agent. And the discipline to supervise it.

Book a 30-minute call. Bring the workflow you're thinking about. We'll tell you whether it's an agent candidate, what the Agent Handbook would look like, and whether YNDR is the right team to build it. No pitch deck.

About the author · Chris Thomas, founder of YNDR. Builds operational AI agents on the Claude Agent SDK and advises mid-market through Fortune 100 on agent deployment. chris@yndr.com