The most profitable AI agents in production right now are not writing poetry or generating marketing campaigns. They're classifying invoices at a logistics company in Ohio. They're drafting support replies at a regional insurance provider. They're cleaning up CRM records at a mid-market SaaS company that hasn't updated contact data since 2019.
This is not the AI future that conference keynotes promised. It's better. It's the AI present that actually pays for itself.
After building agents for a dozen companies over the past two years, I've noticed a pattern: the less exciting the use case sounds in a pitch deck, the faster it hits positive ROI. The "boring" companies - distribution, insurance, manufacturing, professional services - are quietly getting more value from AI agents than most Silicon Valley startups, because they have mountains of repetitive, structured-enough work that no employee wants to do.
Here are five use cases we've seen work repeatedly, with the specific patterns that make them succeed.
1. Invoice Triage and Accounts Payable Routing
Every company with more than a handful of vendors has the same problem: invoices arrive in different formats, through different channels (email, portal, PDF, sometimes fax), and someone has to read each one, classify it, match it to a PO, and route it to the right approver.
An invoice triage agent handles the first 80% of this work. The pattern is straightforward:
- Ingest: Pull invoices from email attachments, shared drives, or AP portals
- Extract: Parse vendor name, amount, line items, PO number, due date
- Classify: Match against known vendor list, flag unknowns
- Route: Send to the correct approver based on amount thresholds and department
- Flag: Surface anomalies (duplicate invoice numbers, amounts that don't match PO, new vendors)
The extraction step is where AI agents shine over traditional OCR. A well-prompted Claude or GPT-4o call can handle invoices in wildly different formats - handwritten notes, scanned PDFs, spreadsheets disguised as invoices - without building custom templates for each vendor.
Anthropic's guide on building effective agents calls this a "prompt chaining" pattern: each step feeds into the next with structured outputs. You don't need a complex autonomous agent. You need a reliable pipeline.
What makes this work in practice: Keep the agent in "extract and suggest" mode for the first month. Show the AP team what the agent classified and let them correct mistakes. Use those corrections to improve your prompts. We typically see accuracy go from 85% to 95%+ within three weeks of this feedback loop.
The numbers: A mid-size distributor processing 2,000 invoices per month cut their AP team's classification time from 4 hours/day to under 45 minutes. The agent handles the clear-cut 80%, and humans focus on the 20% that actually need judgment.
2. Support Reply Drafting and Policy Lookup
Customer support is the use case everyone thinks about, but most companies implement it wrong. They try to build a fully autonomous chatbot that handles everything. Then it hallucinates a refund policy that doesn't exist, and the project gets killed.
The version that works is much more modest: an agent that drafts replies for human agents and pulls relevant policy docs.
Here's the pattern:
- Ticket comes in: Customer writes "I was charged twice for my subscription"
- Agent classifies: Billing issue, priority medium, likely duplicate charge
- Agent retrieves: Pulls the company's duplicate charge policy, the customer's billing history, and any recent similar tickets
- Agent drafts: Writes a reply acknowledging the issue, referencing the specific charges, and proposing the resolution outlined in policy
- Human reviews: Support agent reads the draft, tweaks if needed, sends
This is what Chip Huyen describes as the sensible middle ground between "no AI" and "full autonomy." The agent does the tedious parts (reading policy docs, looking up account history, writing the first draft), and the human does the judgment part (is this the right resolution for this specific customer?).
Where companies mess this up: They skip the retrieval step. An agent without access to your actual policies will make up policies that sound plausible. Always ground the agent in your real documentation using RAG or direct tool access to your knowledge base.
The numbers: A B2B software company with a 12-person support team reduced average first-response time from 4.2 hours to 1.1 hours. The agents didn't replace anyone - they just removed the 20 minutes of searching and drafting that preceded every response. Intercom's Fin agent reports similar patterns, with their AI resolving 50% of support volume for customers who deploy it properly. The key word is "properly" - that means good documentation and a well-structured knowledge base.
3. Scheduling, Follow-ups, and Calendar Management
Scheduling agents are deceptively simple - and deceptively valuable. Every sales team, professional services firm, and recruiting department burns hours on the back-and-forth of finding meeting times, sending reminders, and following up when people don't respond.
The agent pattern here is a classic "tool-user loop" as Simon Willison defines it: the LLM checks calendars, proposes times, sends emails, and loops until the meeting is booked or a human intervenes.
What makes scheduling agents interesting is how many adjacent tasks they absorb:
- Pre-meeting prep: Pull the attendee's LinkedIn profile, recent emails, CRM notes, and last meeting summary
- Follow-up generation: After a meeting, draft a summary email with action items
- No-show handling: If someone doesn't show up, automatically send a reschedule request with new time slots
- Sequence management: For sales, manage the entire outbound cadence - initial outreach, follow-up 1, follow-up 2, breakup email
Clockwise published research showing that the average knowledge worker spends 7.5 hours per week in meetings and another 3-4 hours managing the logistics around them. A scheduling agent attacks that second number directly.
The implementation detail that matters: Your agent needs reliable calendar access with proper scoping. It should only read/write to the calendars it's authorized for, and it should never double-book. This sounds obvious, but calendar APIs are surprisingly tricky - timezone handling alone will eat a week of development time if you're not careful.
The numbers: A consulting firm with 40 consultants saved roughly 6 hours per consultant per week on scheduling logistics. The agent handles 90% of scheduling autonomously and escalates conflicts (like double-booked VIPs) to a human coordinator.
4. CRM Cleanup and Enrichment
This is the use case nobody puts on a slide deck, but every sales leader desperately needs.
Here's the reality of most CRM systems: 25-40% of contact records are incomplete, outdated, or duplicated. Salesforce's own research estimates that bad CRM data costs companies an average of 12% of revenue through missed opportunities and wasted outreach. Nobody wants to spend their Saturday cleaning up 50,000 contact records. So nobody does it, and the data keeps rotting.
A CRM cleanup agent does three things:
- Deduplication: Find records that are probably the same person or company (fuzzy matching on name, email domain, phone number)
- Enrichment: For records missing key fields (title, company size, industry), search public sources and fill in the gaps
- Decay detection: Flag records where the person has likely changed jobs (email bounces, LinkedIn title change, company domain redirect)
The technical pattern is a batch processing agent that runs nightly or weekly. It's not real-time, and it doesn't need to be. It pulls a batch of records, makes API calls to enrichment services, applies fuzzy matching algorithms, and writes back suggestions for a human to approve.
The key design decision: Never let the agent merge or delete records autonomously on day one. Start with a "suggest and review" workflow. The agent flags probable duplicates and a human clicks "merge" or "skip." After you've validated the agent's judgment on a few hundred records, you can increase autonomy for high-confidence matches (like identical email addresses).
Brex's engineering blog has documented how they use AI for internal data quality, and their approach mirrors this: start conservative, measure precision, then gradually increase automation.
The numbers: A SaaS company with 120,000 CRM records ran a cleanup agent for two weeks. It identified 18,000 duplicates (15% of the database), enriched 31,000 records with missing fields, and flagged 8,000 likely-stale contacts. Their sales team reported a 22% improvement in email deliverability in the following quarter.
5. Finance Operations and Anomaly Detection
Finance teams deal with pattern recognition problems every day: expense reports that look suspicious, transactions that don't match expected patterns, budget line items that are trending in the wrong direction. Most of this analysis happens manually in spreadsheets, usually too late to prevent the problem.
A finance anomaly detection agent monitors transaction streams and flags outliers. The pattern is deliberately narrow:
- Expense review: Flag expense reports that exceed policy limits, have unusual vendors, or show patterns consistent with fraud (round numbers, just-under-threshold amounts, weekend submissions)
- Vendor payment monitoring: Detect duplicate payments, payments to inactive vendors, or amounts that deviate significantly from historical patterns
- Budget variance: Track actual spend against budget in real-time and alert when a category is trending toward overrun
This is not a general-purpose "analyze our finances" agent. That would be dangerous. It's a set of specific classifiers, each trained on a narrow problem, running against structured financial data.
Netflix's engineering team has written extensively about anomaly detection in their systems, and the principles translate directly: define what "normal" looks like, set thresholds for deviation, and keep false positive rates low enough that humans don't start ignoring alerts.
The critical constraint: Finance agents should flag, not act. An agent that moves money or approves payments is a security and compliance nightmare. The right architecture is: agent detects anomaly, creates alert with evidence, human reviews and decides. The OpenAI practical guide to building agents calls this the "human-in-the-loop" pattern, and for finance, it's non-negotiable.
The numbers: A manufacturing company processing 5,000 transactions per month deployed an anomaly detection agent that caught $340,000 in duplicate payments in its first quarter - payments that had been slipping through their manual three-way match process for months. The agent's false positive rate stabilized at 3% after two weeks of tuning.
The Pattern Behind All Five
If you look at these five use cases, they share a structure:
| Property | What works | What doesn't |
|---|---|---|
| Scope | One specific task | "Handle all of finance" |
| Autonomy | Draft and suggest | Act without review |
| Data | Structured or semi-structured | Completely unstructured |
| Feedback | Human corrections improve it | Set and forget |
| Risk | Mistakes are catchable | Mistakes hit customers directly |
This maps to what I'd call the Agent ROI Ladder:
- Level 1: Draft and summarize (support replies, meeting notes)
- Level 2: Classify and route (invoices, tickets, leads)
- Level 3: Recommend actions (anomaly alerts, CRM suggestions)
- Level 4: Execute with approval (schedule meetings, merge records)
- Level 5: Execute autonomously with audit trail (expense pre-approval, auto-routing)
Most companies should start at Level 1 or 2. The temptation is to jump to Level 5 because it sounds more impressive. Resist that. Every successful Level 5 deployment we've seen started as a Level 2 that earned trust over months.
Getting Started Without Getting Burned
If you're evaluating AI agents for your "boring" company, here's what I'd do:
Pick the use case with the most volume and the least risk. Invoice classification beats autonomous payment processing. Support drafting beats autonomous refund approval. CRM cleanup beats automated outreach.
Measure before you build. How many invoices per month? How long does classification take? What's the error rate today? You need these numbers to prove ROI later, and you need them now to scope the project.
Budget for integration, not just AI. The LLM API call is the cheap part. Connecting to your AP system, your CRM, your ticketing tool - that's where the time goes. Gartner's 2025 AI implementation survey found that integration work accounts for 60-70% of total project cost in enterprise AI deployments.
Set a 30-day evaluation window. Run the agent in shadow mode (it processes everything but a human still does the real work) for the first month. Compare the agent's outputs to the human's decisions. If accuracy is above 90%, start shifting real work to the agent. If it's below 80%, your prompts or data need work before you go further.
The companies getting the most value from AI agents right now aren't the ones with the flashiest demos. They're the ones that picked a boring problem, built a narrow solution, and let it compound over time. That's not a pitch deck story. It's a P&L story. And those are the ones that matter.