A logistics company we talked to last year spent four months building an AI agent to handle vendor invoice approvals. The model was fine. The integration worked. It went live, and within two weeks the finance team had quietly routed around it because it kept approving invoices that should have been held. The reason was not the agent. It was that nobody had asked the one person who actually understood the process why she held roughly one invoice in nine. Her rule lived nowhere except her head, and the project had mapped the documented process instead of the real one.
This is the most common way AI automation projects die. Not in the model selection meeting, not in the integration sprint, but months earlier, in the discovery work that never happened. You cannot govern a workflow you have not mapped, and you cannot map a workflow by reading the standard operating procedure. This is the first post in a series on building governed AI workflows, and it starts where every real project should: with the process, not the model.
The Model Is the Last Thing You Should Pick
There is a strong pull, especially from technical teams, to start an automation project by choosing the architecture. Which model. Single agent or multi-agent. Which orchestration framework. This is backwards, and it is backwards for a reason that has nothing to do with technology.
Anthropic's own guidance on building effective agents opens by telling people not to build agents when a simpler workflow will do. LangChain makes a similar point in its workflows versus agents documentation: workflows follow predetermined paths and are predictable, while agents make dynamic decisions and cost you that predictability. Both pieces of advice assume you already know your process well enough to tell which one you need. Most teams do not.
When you start from the model, you make a quiet assumption that the process is a known quantity and the only open question is how clever the automation should be. The opposite is true. The process is the unknown. The model is the commodity. A 2026 MIT Sloan analysis of AI decision-making lands on the same point from the executive side: the organizations getting return on agentic AI are the ones that standardized and understood their processes first. The model is interchangeable. Your understanding of the work is not.
The business framing for a COO or operating partner is simple. You would not buy a company without diligence on how it actually makes money. Do not automate a process without diligence on how it actually runs.
Shadow the Operator Before You Spec the Agent
The single highest-value activity in discovery is also the one most often skipped: sit next to the person who does the work and watch them do it for a full cycle.
Not interview them. Watch them. The gap between what people say they do and what they actually do is enormous, and it is not because anyone is lying. The real process is full of micro-decisions, workarounds, and informal rules that have become so automatic the operator no longer notices them. Ask a claims adjuster how she processes a claim and she will describe six steps. Watch her and you will count fourteen, four of which involve checking something in a system nobody mentioned because "everyone knows you check that."
This is well-trodden ground in operations research. The discipline of process mining exists precisely because the documented process and the executed process diverge so reliably that you need event-log analysis to see the truth. You do not always need software for this. For a single mid-market workflow, a person with a notebook watching three or four full cycles will surface more than most process documentation ever captured.
When you shadow, capture four things:
- What the operator looks at before each decision, including the systems and the side channels (a Slack message, a sticky note, a colleague they ask).
- What they do at each step, in the order they actually do it, not the order the SOP claims.
- When they pause. Hesitation is signal. A pause usually means a judgment call or a missing piece of information.
- What they ignore. The fields they skip, the warnings they dismiss, the steps the manual requires that nobody has done in two years.
The spiralscout team makes a useful point in its write-up on turning a messy business process into a working AI agent: the messiness is not noise to be cleaned up before automation, it is the actual specification. If your map is tidy, you have probably missed something.
Map Inputs, Outputs, and the Handoffs Nobody Documented
Once you can see the process, render it as a flow of inputs and outputs with explicit handoffs. This sounds basic. It is basic. It is also where most "we already documented this" claims fall apart, because existing documentation almost always describes responsibilities, not data flow.
For each step, answer three questions. What comes in, what goes out, and who or what touches it next. The third question is where the value hides. Handoffs are where work stalls, where context gets lost, and where the most expensive errors are introduced.
Pay special attention to handoffs that cross a system boundary or a team boundary. A handoff inside one person's head is cheap. A handoff from the sales team's CRM to the finance team's ERP, mediated by a spreadsheet that someone exports every Friday, is the kind of thing that looks like a candidate for automation and frequently is. McKinsey's research in its state of AI work consistently finds that the gains from automation cluster around these seams between functions, not within a single well-run team.
A simple discovery table beats a fancy diagram for getting sign-off from operations leadership:
| Step | Input | Output | Handoff to | Failure mode |
|---|---|---|---|---|
| Receive invoice | PDF from vendor email | Structured fields | Matching engine | OCR misreads totals |
| Match to PO | Invoice fields + PO record | Match / no-match flag | Approver queue | Partial shipments |
| Approve | Matched invoice | Approved / held | Payment run | Vendor on watchlist |
| Pay | Approved invoice | Payment record | Reconciliation | Duplicate payment |
The "failure mode" column is the one that earns its keep. It forces the conversation about exceptions, which is the part of discovery that actually determines whether an agent is worth building.
The Exceptions Are the Work
Here is the counter-intuitive truth that separates real automation from demo-ware: the happy path is rarely worth automating on its own, and the exceptions are usually the entire point.
Most processes follow a familiar shape. Seventy to eighty percent of cases sail through a standard path. The remaining twenty to thirty percent are exceptions, and they consume the majority of the operator's time, attention, and expertise. An automation that handles the easy eighty percent and escalates everything else to a human has not removed the bottleneck. It has built a very expensive way to do the easy part while leaving the hard part exactly where it was.
This is the mistake the logistics company made. Their agent automated the eight invoices in nine that any junior clerk could approve, and failed on the ninth, which was the only one that ever needed a person in the first place.
So in discovery, treat exceptions as first-class citizens. For each exception type, capture:
- How often it occurs. Frequency tells you whether it is worth handling in the automation or worth a clean escalation path.
- How it is detected. Often the detection rule is itself the valuable, undocumented knowledge.
- What the operator does about it. Sometimes a rule, sometimes a judgment call, sometimes a phone call to someone three desks away.
- What it costs to get wrong. A misrouted support ticket and a wrongly approved six-figure invoice are not the same risk class.
Chip Huyen's observation about production systems applies directly here. In her writing on AI engineering pitfalls, she notes that the journey from a system that works 60 percent of the time to one that works in production is the hard 40, and that last stretch is almost entirely about handling cases the cheerful prototype ignored. Exceptions are that 40.
When you finish exception mapping, you often discover that an agent is not the right answer at all, or that only one sub-process is worth automating. That is a successful discovery, not a failed one. Knowing when to use agents versus deterministic automation is covered in agentic workflows vs traditional automation.
Document Where Judgment Actually Lives
The final discovery task, and the one that turns a map into a governed workflow, is to mark every point where genuine human judgment is required.
Not every decision is judgment. A great deal of what looks like judgment is actually an undocumented rule that an experienced person applies so fast it feels like intuition. Those become automation logic. Real judgment is different: it involves weighing factors that are not fully specifiable, accepting accountability for an outcome, or making a call where the cost of being wrong is high enough that a human should own it.
This distinction is the heart of governance. A governed AI workflow is not one with a vague "human in the loop" gesture somewhere in the diagram. It is one where you have identified the specific decisions that must route to a person, designed the approval mechanism around exactly those decisions, and made every automated step auditable. When a regulator, an auditor, or a PE operating partner asks "who approved this and why," the workflow has an answer.
Practically, sort every decision in your map into three buckets:
- Automate fully. Deterministic rules, clear inputs, low cost of error. The agent or workflow handles these end to end.
- Automate with approval. The system recommends or prepares, a human confirms. This is where most high-value, high-risk work belongs.
- Keep human. Genuine judgment, high accountability, or cases too rare and varied to justify automation logic.
That three-way sort is your governance spec. It tells you where to place approval gates, what to log, and what to escalate. It also gives operations leadership something concrete to sign off on before a line of integration code is written. The decisions you make here shape everything that comes after launch, which is why the operating model after launch depends so heavily on getting the judgment map right during discovery.
How OpenNash Can Help
Most teams that come to us already know what they want to automate. Fewer can show us how the process actually runs, and that gap is where we start. OpenNash begins every engagement with a discovery phase that maps operations, shadows the people doing the work, and documents the inputs, handoffs, exceptions, and judgment points before any agent is designed. That map becomes the spec for a governed build, with approval gates placed exactly where the judgment lives and full audit trails on everything automated.
This is not the right fit for every situation. If your process is already clean, deterministic, and well documented, a platform tool may get you there faster and cheaper, and we will tell you so. Custom is worth it when the workflow is specific to your business, the exceptions carry real risk, and you need to own the system outright after handoff. If you are still deciding whether your process is even ready, that itself is a discovery conversation worth having.
The path from a messy real-world process to a production system that operators trust runs straight through the work described here, and continues into the agent deployment checklist. If you want to map one of your processes before you commit to a build, book a call and we will walk through it with your operations team.