The question that actually matters
Most AI agent projects do not fail because the model could not produce a plausible answer. They fail because the team chose the wrong workflow. The demo looked impressive, but the process only happened a few dozen times a month, the quality bar was never measurable, or the step everyone wanted automated was exactly the step that required judgment no one could write down.
Good consulting front-loads that risk. The most valuable early work is not building an agent. It is deciding, with evidence, which workflow deserves to become the first agent and which ones should stay as ordinary software, human process, or later-stage candidates.
So the first real deliverable is a defensible answer to one question: of all the repetitive work in the business, which single workflow should become the first agent, and why that one?
A workflow is a good candidate when four things are true
The candidates that survive production tend to share four properties. First, the workflow repeats enough to matter. A process that runs 2,000 times a month at five minutes each is about 165 hours of monthly work - though that is the ceiling, not the saving, since the agent still needs review and exception handling that you subtract back out. A process that runs 40 times a month may still be annoying, but it rarely justifies a production agent as the first build.
Second, success can be measured without a meeting. If the team cannot write down what a correct result looks like - resolved ticket, correctly coded invoice, approved draft, completed intake - it cannot build useful evals. Without evals, the agent cannot be trusted or improved.
Third, the failure mode is survivable. A draft that a human reviews has a cheap failure mode. A system that issues refunds, changes records, or touches regulated data has a more expensive one. Early candidates should have failures that are visible, reversible, and caught before they reach the customer.
Fourth, the tool surface is bounded. An agent that needs three clear APIs is buildable. An agent that needs to understand the whole business is a research project. The cleaner the inputs, actions, owners, and exceptions, the faster the project becomes real.
Simple scoring rubric
Treat measurability and survivable failure as gates, not scores: if either lands below 3, drop the candidate no matter how it ranks elsewhere. An unmeasurable or unsafe workflow is disqualified, not discounted. Score the survivors 1 to 5 on repetition and bounded tool surface, then rank them by hours at stake: volume times minutes per run, net of the review time the agent still needs. The best first workflow is rarely the flashiest demo. It is the highest-volume workflow that clears both gates - enough clarity and safety to teach the organization how to operate agents before the stakes get high.
Workflow, agent, or no agent at all
Not every AI automation problem needs an agent. There is a ladder of complexity, and the safest move is to climb only as far as the workflow forces you.
A single model call with retrieval and strong examples can handle classification, drafting, summarization, and extraction. A workflow - fixed steps such as classify, retrieve, draft, check, route - handles tasks that can be decomposed in advance. A true agent earns its place when the path cannot be predicted ahead of time and the system must decide which tools to use across multiple steps.
A lot of requests that begin as 'we need an AI agent' resolve to 'we need one model call with two tools and a human review step.' Saying that is not being conservative. It is how you avoid paying for complexity before the workflow has earned it.
What a useful consulting engagement leaves behind
A serious engagement leaves artifacts the business can inspect. At minimum, there should be a scored workflow inventory, a workflow map for the chosen process, and the smallest prototype scope worth testing.
The workflow map should name the trigger, source system, required context, allowed tools, write actions, owner, exception types, review threshold, success metric, and rollback path. If those fields are missing, the team is probably still buying a demo rather than an operating plan.
The prototype scope should be deliberately narrow. The point is not to build the full vision first. The point is to test the riskiest assumption against real cases quickly enough that the business can change direction before budget hardens.
- Scored workflow inventory with volume and risk assumptions visible.
- Workflow map with owners, systems, actions, exceptions, and rollback path.
- Prototype scope tied to a metric, baseline, and pass/fail test cases.
- Decision memo that recommends build, buy, wait, or use simpler automation.
How to know in two weeks, not two quarters
The fastest way to de-risk an agent project is to agree on test cases before arguing about architecture. Pick the workflow, write the normal case, the missing-information case, the tool-failure case, the policy-boundary case, and the regression case. Then decide which metric has to move for the project to continue.
Only after that should anyone build. If the prototype cannot move the agreed metric on real cases in two weeks, the workflow either is not ready, is not valuable enough, or needs a smaller design. That is a good outcome. Two weeks spent disqualifying a workflow is far cheaper than two quarters spent building the wrong one.