What is an AI agent pattern?

An agent pattern is a reusable architecture for how an LLM interacts with tools, makes decisions, and processes information. Patterns range from simple (prompt chaining) to complex (multi-agent orchestration), and choosing the right one determines whether your system works reliably in production.

Which agent pattern should I start with?

Start with prompt chaining. It's deterministic, debuggable, and handles most real-world use cases. Only move to more autonomous patterns when you have clear evidence that simpler approaches aren't working and you have evaluation systems in place.

What's the difference between workflows and agents?

Workflows follow predetermined code paths with LLMs as components—predictable and consistent. Agents let the LLM dynamically decide which path to take—more autonomous but harder to debug. Most production systems that call themselves 'agents' are actually sophisticated workflows.

How do I know if my agent pattern is working?

Measure task completion rate, error rate per step, latency, and cost per request. Compare against a baseline (often a simpler pattern). If you can't measure it, you can't improve it—and you definitely shouldn't deploy it.

When should I use multi-agent systems?

Only when a single agent genuinely can't handle the task complexity, and when you have robust inter-agent communication protocols and evaluation systems. Most multi-agent hype is premature—start with a well-designed single agent with good tools.

The 5 Agent Patterns That Actually Work in Production

Most teams building "AI agents" are building the wrong thing.

They read about autonomous systems that can browse the web, write code, and manage complex projects. They spin up the OpenAI API, wire up a dozen tools, and watch their agent hallucinate its way into a $400 API bill while accomplishing nothing useful.

The gap between demo and production isn't more autonomy. It's less. The teams shipping reliable AI systems aren't building autonomous agents. They're building carefully constrained workflows that use LLMs as components, not decision-makers.

Anthropic's Building Effective Agents guide codifies what practitioners have learned through painful experience: the most reliable agent architectures are the simplest ones that solve your problem. They outline five patterns, ordered from simplest to most complex, and the guidance is clear: exhaust each level before graduating to the next.

Here's what each pattern actually looks like in production, when to use it, and the failure modes that catch teams off guard.

Pattern 1: Prompt Chaining - The Workhorse You're Probably Underusing

Prompt chaining breaks a task into fixed steps, where each LLM call processes the output of the previous one. There's no autonomy here. The sequence is predetermined. The LLM is a sophisticated text processor, not a decision-maker.

The pattern:

Input → LLM Call 1 → Gate/Transform → LLM Call 2 → Gate/Transform → Output

Real production example: Document processing for legal contracts.

Extract key terms, dates, parties from the document
Gate: Verify extraction confidence > 0.9, else flag for human review
Classify contract type and risk level
Gate: If high-risk, route to senior review
Generate summary and action items
Validate summary against original (catch hallucinations)

Each step has a single, clear purpose. Each gate catches failures before they compound. The "agent" is really a sophisticated pipeline.

When to use prompt chaining:

Tasks with clear sequential steps
When you need verifiable intermediate outputs
When different steps benefit from different prompts or system instructions
When you want to catch and handle failures at each stage

The failure mode nobody talks about: Latency accumulation. Five sequential LLM calls at 2 seconds each means 10 seconds minimum response time. Users don't wait 10 seconds. Design for parallelization from the start (Pattern 3), even if you implement sequentially first.

Cost optimization: Not every step needs your most capable model. Extraction and classification often work fine with Claude Haiku or GPT-4o-mini at 1/10th the cost. Reserve Opus/GPT-4 for reasoning-heavy steps.

Pattern 2: Routing - Cut Costs Without Cutting Corners

Routing uses an initial classification to direct inputs to specialized handlers. It's how you serve enterprise-grade quality without enterprise-grade bills.

The pattern:

Input → Classifier → Route A (simple) → Output
                  → Route B (complex) → Output  
                  → Route C (specialist) → Output

Real production example: Customer support automation.

A financial services company processes 50,000 support tickets monthly. Before routing:

Every ticket hits GPT-4: $0.03 per ticket average
Monthly cost: $1,500
Quality issues: Overkill for simple queries, undertrained for specialist topics

After implementing routing:

Route A (60% of tickets): FAQ-style questions → RAG lookup + Haiku response ($0.002/ticket)
Route B (30% of tickets): Standard support → GPT-4o-mini with company context ($0.008/ticket)
Route C (10% of tickets): Complex/sensitive → GPT-4 with full reasoning ($0.05/ticket)

Monthly cost after routing: ~$400. Same quality, 73% cost reduction.

Implementation detail that matters: Your classifier IS the routing decision. A bad classifier means expensive queries going cheap (quality degradation) or cheap queries going expensive (cost inflation). Invest heavily in classifier accuracy. It's the highest-impact optimization in this pattern.

When to use routing:

High-volume applications with mixed complexity
When different query types genuinely need different handling
When cost optimization is a priority
When you have enough data to build a reliable classifier

The failure mode: Routing based on surface features rather than actual complexity. A short question isn't necessarily simple ("Should I exercise my stock options before the merger closes?"). Build your classifier on task complexity, not input length.

Pattern 3: Parallelization - The Latency Killer

Parallelization runs independent subtasks simultaneously, either processing different inputs (sectioning) or getting multiple perspectives on the same input (voting).

The pattern (sectioning):

Input → Split → [LLM Call A] → Aggregate → Output
              → [LLM Call B] ↗
              → [LLM Call C] ↗

The pattern (voting):

Input → [LLM Call 1 with Prompt A] → Vote/Merge → Output
      → [LLM Call 2 with Prompt B] ↗
      → [LLM Call 3 with Prompt C] ↗

Real production example (sectioning): Analyzing a 100-page annual report.

Sequential approach: 5 minutes, processing 10 pages at a time. Parallel approach: 45 seconds, processing all 10 chunks simultaneously.

The aggregation step synthesizes findings. You're trading straightforward implementation for dramatic latency improvement.

Real production example (voting): Code review automation.

Three parallel reviews with different focuses:

Security vulnerabilities (prompt emphasizes OWASP Top 10)
Performance issues (prompt emphasizes algorithmic complexity)
Maintainability (prompt emphasizes code clarity, naming, structure)

Merge step combines findings, deduplicates, and ranks by severity. Each reviewer sees the same code but through a different lens.

When to use parallelization:

Latency-sensitive applications where sequential processing is too slow
Tasks that naturally decompose into independent subtasks
When multiple perspectives improve output quality
When you can afford the increased API costs (parallel = more concurrent calls)

The failure mode: Assuming independence when subtasks actually depend on each other. If Chunk 3's analysis depends on understanding established in Chunks 1 and 2, parallelization introduces errors. Map your actual dependencies before parallelizing.

Cost consideration: Voting patterns multiply your API costs by the number of voters. Three parallel reviews = 3x the cost. Ensure the quality improvement justifies the expense: run A/B tests, not assumptions.

Pattern 4: Orchestrator-Workers - When You Actually Need Delegation

The orchestrator-workers pattern introduces real autonomy: a central LLM dynamically breaks down tasks and delegates to specialized workers. This is where "agent" starts meaning something.

The pattern:

Input → Orchestrator → [Analyze task, create subtasks]
                     → Delegate to Worker A → Result A
                     → Delegate to Worker B → Result B
                     → [Synthesize results] → Output

Real production example: Competitive intelligence gathering.

User query: "How is Stripe positioning against Adyen in the European market?"

Orchestrator breaks this into:

Search Worker: Find recent Stripe announcements about European expansion
Search Worker: Find recent Adyen European market share data
Analysis Worker: Compare pricing structures from public documentation
Synthesis Worker: Identify positioning differences and strategic implications

The orchestrator doesn't know in advance exactly what information exists or what the workers will find. It adapts the synthesis based on what comes back.

Critical implementation detail: The orchestrator needs to know what each worker can do. This isn't magic; it's carefully designed tool descriptions and clear capability boundaries. Vague worker definitions lead to misrouted tasks and compounding errors.

When to use orchestrator-workers:

Tasks where subtask structure can't be predetermined
When specialized capabilities genuinely help (code generation worker vs. research worker)
When you need to scale complexity beyond what a single prompt can manage
When you have robust error handling and evaluation infrastructure

The failure modes (there are several):

Orchestrator overreach: The orchestrator tries to delegate tasks the workers can't handle. Solution: Explicit capability descriptions and graceful failure handling.
Worker isolation: Workers can't share context, leading to redundant work or contradictory outputs. Solution: Shared memory or context passing (adds complexity).
Infinite loops: The orchestrator keeps delegating without converging on an answer. Solution: Hard limits on iterations and explicit completion criteria.

Honest assessment: Most teams implementing orchestrator-workers would be better served by well-designed prompt chains. The autonomy this pattern provides is seductive but expensive to make reliable. If you can enumerate your subtasks in advance, use prompt chaining instead.

Pattern 5: Evaluator-Optimizer - The Quality Ratchet

The evaluator-optimizer pattern generates output, evaluates it against criteria, and iteratively refines until quality thresholds are met. It's how you get outputs that meet specific standards, not just "pretty good" outputs.

The pattern:

Input → Generator → Output v1 → Evaluator → [Meets criteria?]
                                          ↓ No
                    ← Feedback ← [Generate specific feedback]
                                          ↓ Yes
                                       Final Output

Real production example: Marketing copy generation for regulated industries.

A healthcare company needs marketing copy that's compelling AND compliant. First drafts from LLMs routinely include claims that would trigger FDA review.

The evaluator checks against:

Prohibited claim patterns (specific regex + semantic matching)
Required disclosure presence
Tone guidelines (professional but accessible)
Brand voice consistency

When evaluation fails, specific feedback goes back to the generator: "Claim 'clinically proven' on line 3 requires citation. Rephrase or add supporting evidence."

The loop continues until all criteria pass or max iterations hit (then human review).

When to use evaluator-optimizer:

Outputs must meet specific, verifiable criteria
When "good enough" isn't good enough (legal, medical, financial content)
When you can define clear evaluation rubrics
When the cost of iteration is less than the cost of bad output

Implementation detail: The evaluator and generator should ideally use different prompts or even different models. If the same model that generated the error evaluates it, blind spots persist. Cross-model evaluation or specialized evaluation prompts catch more issues.

The failure mode: Evaluation criteria that the generator can't actually satisfy. If your evaluator demands perfect factual accuracy but your generator hallucinates, you get infinite loops. Match your evaluation criteria to what's actually achievable, then use external verification for claims that matter.

Cost and latency: This pattern multiplies both. Three iterations = 3x generator cost + 3x evaluator cost. For a 2-second generation + 1-second evaluation, three iterations means 9 seconds minimum. Design your criteria to pass on first attempt most of the time, with iteration as the exception.

Choosing Your Pattern: The Decision Framework

The temptation is to start complex. Resist it.

Start here:

Can you enumerate the exact steps in advance? → Prompt Chaining
Do you have high volume with mixed complexity? → Routing
Are there independent subtasks that can run simultaneously? → Parallelization
Must the task decomposition happen dynamically? → Orchestrator-Workers
Must outputs meet specific verifiable criteria? → Evaluator-Optimizer

Combine patterns intentionally: Real production systems layer these. A routed system might use prompt chaining within each route. An orchestrator might spawn workers that use evaluator-optimizer loops. But start with one pattern, prove it works, then add complexity.

The meta-pattern for production:

Route → [Simple route: Prompt Chain]
      → [Complex route: Orchestrator-Workers with Evaluator loop]

This gives you cost efficiency (routing), reliability (chaining for simple cases), capability (orchestration for complex cases), and quality (evaluation for critical outputs).

What Nobody Tells You About Production Agent Systems

1. Evaluation is your actual product.

Your agent is only as good as your ability to measure whether it works. Before building any pattern, define:

What does success look like for this task?
How will you measure it automatically?
What's the human baseline you're comparing against?

Hamel Husain's evaluation guide should be required reading before any agent implementation.

2. Simpler patterns have compounding advantages.

Every additional LLM call is a potential failure point. Prompt chains have N failure points. Orchestrator-workers have N × M failure points (N workers, M potential interactions). The math isn't linear. Complexity compounds.

3. The "agent" framing is often wrong.

As Simon Willison notes, an agent is "an LLM running tools in a loop to achieve a goal." Most successful production systems don't fit this definition - they're sophisticated workflows with LLMs as components. That's not a limitation; it's a feature.

4. Cost optimization happens at the pattern level, not the prompt level.

Prompt engineering saves you 10-20%. Choosing the right pattern (routing to appropriate model tiers, parallelizing for throughput) saves you 60-80%. Optimize architecture first.

The Honest Path Forward

Here's what I'd tell a team starting today:

Week 1-2: Build the simplest possible version using prompt chaining. No frameworks: just API calls and Python. Get something working end-to-end.

Week 3-4: Instrument everything. Log inputs, outputs, latencies, costs, failure rates. You can't optimize what you can't measure.

Week 5-6: Based on actual data (not assumptions), identify your bottleneck. Is it cost? Add routing. Latency? Add parallelization. Quality? Add evaluation loops.

Week 7+: Only now consider orchestrator-workers or multi-agent architectures - and only if simpler patterns genuinely can't solve your problem.

The teams shipping reliable AI agents aren't the ones with the most sophisticated architectures. They're the ones who chose the simplest pattern that works and invested heavily in evaluation. Everything else is premature optimization - or worse, premature complexity.

The five patterns exist on a spectrum from fully deterministic to fully autonomous. Your job isn't to reach maximum autonomy. It's to find the minimum autonomy that solves your users' problems reliably. Start simple, measure obsessively, and earn your complexity.

Pattern 1: Prompt Chaining - The Workhorse You're Probably Underusing

Pattern 2: Routing - Cut Costs Without Cutting Corners

Pattern 3: Parallelization - The Latency Killer

Pattern 4: Orchestrator-Workers - When You Actually Need Delegation

Pattern 5: Evaluator-Optimizer - The Quality Ratchet

Choosing Your Pattern: The Decision Framework

What Nobody Tells You About Production Agent Systems

The Honest Path Forward

Frequently Asked Questions