A customer asks for a refund on a duplicate charge. Your AI agent reads the order, confirms the double billing, issues the refund, and closes the ticket. Clean. Then the customer replies: the refund went to a closed card, and now they want it as store credit, plus they mention the original order arrived damaged. The agent, having already "resolved" the case, has no memory of the refund it just issued and no path to pull a human in with that history attached. The customer re-explains everything from scratch to an agent who starts cold.

That second half is where Tier 2 automation actually lives, and where most of it breaks. Tier 1 deflection (order status, password resets, policy lookups) is a solved problem. Tier 2 is the harder tier: refunds, order remediation, plan changes, and multi-step troubleshooting that touch your systems of record. The failure mode that sinks these projects is rarely the model picking the wrong answer. It is the agent doing something real and then severing the thread when a human needs to take over.

Tier 2 is an actions problem, not a chat problem

The instinct from Tier 1 is to think about Tier 2 as "smarter answers." It isn't. As Moveworks lays out in its enterprise guide to Tier 2 help desk support, Tier 2 is defined by escalated complexity and the need to actually change state: provisioning access, modifying accounts, coordinating across systems. The unit of work is an action against a backend, not a sentence.

That reframing changes the whole design. A Tier 1 bot can be wrong and the cost is a confused customer who re-asks. A Tier 2 agent that issues a $4,000 refund to the wrong account, downgrades the wrong subscription, or cancels an order that already shipped creates a financial and trust problem that a follow-up message cannot undo. McKinsey's analysis of AI-enabled customer service makes the point that the value sits in resolving issues end to end, but end-to-end resolution means write access, and write access means you need controls that informational bots never required.

So before you automate a single Tier 2 ticket, you have to inventory the actions inside it.

Build an action ledger and classify by blast radius

The most useful framework here is dead simple. List every action your Tier 2 team performs, then sort each one into three buckets by how much damage it can do and how hard it is to undo:

Action class Examples Default automation posture
Read-only Look up order history, check entitlement, read account status, pull shipping tracking Automate fully
Reversible write Apply credit under $X, reset a feature flag, extend a trial, re-send a shipment Automate within policy limits, log everything
Irreversible / high-blast Issue refund to original payment, cancel a paid plan, delete data, refund above threshold Gate behind human approval

This ledger is the single most important artifact in a Tier 2 project, and almost nobody builds it. It forces the conversation that actually matters: not "can the AI handle refunds," but "which refunds, up to what amount, under what conditions, and with what undo path." A return inside the stated window with a matching order is a reversible write. A goodwill refund of $800 on a churned account is an irreversible, high-blast action. They are not the same automation just because both contain the word "refund."

Anthropic's guide to building effective agents argues for the simplest pattern that solves the problem and for keeping humans in the loop where stakes are high. For Tier 2, "simplest" usually means a tightly scoped tool-using agent with a short list of well-described actions, not an autonomous system that can call anything. The action ledger is how you decide what goes on that short list.

Key takeaway: you are not automating tickets, you are authorizing actions. Authorize narrowly.

Permission gates and write-backs are the control layer

Once you know which actions need a gate, the gate itself has to be real, not a checkbox. A permission gate for Tier 2 has three parts:

  1. A policy boundary the agent checks before acting. Refund only within the return window, only when the order status is "delivered" or "lost," only up to the stored threshold for that account tier. The agent evaluates these against your data, not against a vibe.
  2. An approval step for anything outside the boundary. When the agent wants to act beyond policy, it drafts the action (refund $X to account Y because Z) and routes it to a human who approves or rejects with one click. The human is approving a specific, pre-filled action, not re-investigating the whole ticket.
  3. A write-back that records what happened. Every action the agent takes gets written back to the ticket and the system of record with the reasoning, the inputs, and the result. This is what makes the action auditable and, when needed, reversible.

The write-back is the piece teams skip, and it is the one that saves you. DevRev's help desk automation strategy guide emphasizes connecting automation into the systems agents already use rather than building a parallel universe. That matters for Tier 2 specifically: the refund the agent issued has to show up in Zendesk, in your billing system, and in the customer's account history as a traceable event with an actor and a reason. If the only record of an automated action is buried in a model log, your human agents are flying blind the moment they take over.

A concrete shape that works: the agent operates against your existing tools (the helpdesk plus the backend APIs), so a "process return" action calls the same returns endpoint a human would, with the same validations, and logs the same audit entry plus the agent's reasoning. No shadow workflow, no separate refund path the finance team can't see.

Escalation is the product, not the fallback

Here is the counter-intuitive part. The success metric for Tier 2 automation should not be how many tickets the agent closes alone. It should be how clean the escalations are when it cannot.

Think about why Tier 2 exists at all. These are the cases that already got harder than Tier 1. A meaningful share of them will, and should, end up with a human. IrisAgent's work on reducing ticket escalations frames escalation reduction as the goal, but the more durable framing is escalation quality. You will not, and should not, drive Tier 2 escalations to zero. You want the escalations that do happen to land in a human's lap fully assembled.

A clean handoff carries:

  • The full customer history across channels, not just the current chat. If the customer emailed last week and is now in chat, that context follows them.
  • Every action the agent already attempted or completed, with results. "Issued $40 credit at 14:02, refund to original card failed (card closed)."
  • The agent's current read of the problem and why it stopped. "Customer wants store credit instead, and reports the item arrived damaged. This requires a goodwill decision above the $100 auto-approve limit."
  • A link to the trace so the human can verify rather than re-interrogate.

When you measure this, the win condition flips. A messy escalation, where the human has to re-ask the customer everything, is a failure even if the agent's deflection number looks great. Cross-channel memory is what makes the difference: the context the agent gathered does not evaporate at the handoff line.

Klarna's widely cited result, where its AI assistant handled two-thirds of customer service chats in its first month, gets quoted as a deflection story. The more interesting read for Tier 2 builders is the other third: the cases that still needed people, and whether those people inherited context or started cold. The deflection number is only safe to chase if the handoff for the remainder is solid.

Key takeaway: design the escalation path first, then build resolution paths back from it. If the handoff is broken, every autonomous resolution is one customer away from a cold restart.

Start narrow, prove the trace, then widen

The fastest way to lose trust in Tier 2 automation is to launch it across every ticket type at once. The fastest way to build trust is to ship one narrow slice and let the data accumulate.

A rollout that holds up:

  • Pick one ticket type and one reversible action. For example, "returns within the stated window for delivered orders." High volume, clear policy, easy to reverse.
  • Run in suggest-only mode first. The agent drafts the action and a human approves every one. You are not saving time yet; you are collecting evidence about how often the agent is right and where it strays. Zendesk's CX Trends research consistently shows trust and control as the gating factors for support automation adoption, and suggest-only is how you earn both internally.
  • Promote to autonomous within the policy boundary. Once the agent's drafted actions match human decisions at a rate you are comfortable with, let it act on the in-policy cases and keep routing edge cases to approval.
  • Watch reversal rate and escalation quality, not just resolution rate. If automated actions are getting undone, your policy boundary is too loose. If escalations are landing without context, fix the handoff before you add scope.
  • Add the next action only after the current one is boring. Boring is the goal. A second action type, a higher threshold, a new ticket category, one at a time.

This staged approach also solves the pricing trap that quietly kills Tier 2 ROI. Many automation tools bill per resolution, which punishes exactly the multi-step work Tier 2 involves. A single Tier 2 case might mean three lookups, a credit, a refund attempt, and an escalation. Under per-resolution pricing, complex work inflates the bill at the worst possible moment. Flat-fee or capacity-based models keep the economics sane when the work is genuinely multi-step.

How OpenNash CX Can Help

If you are deciding between a platform and a custom build, here is the honest split. If your Tier 2 work is standard (returns, simple plan changes) and lives entirely inside one helpdesk, a packaged automation feature from your existing vendor is often the right first move. If your Tier 2 actions span the helpdesk plus backend systems, need custom permission gates, or require cross-channel memory that carries into escalation, that is where off-the-shelf tends to hit a wall, and where a custom build earns its keep. And if you don't yet have your action ledger written down, you are not ready for either. Map the actions first.

OpenNash CX builds production Tier 2 support agents that plug into the tools your team already runs (Zendesk plus your backend APIs), with the controls described above: an action ledger turned into real permission gates, write-backs that keep every action auditable, and escalation handoffs that carry full cross-channel context to the human. The work follows a fixed path: audit your current ticket flow and action inventory, design the guardrails and approval protocols, build and test against your real cases, then deploy with full ownership handoff so your team owns the system. Pricing is flat-fee, so multi-step Tier 2 work does not inflate a per-resolution bill.

If you want to map this pattern to your own escalation flow, book a call and we will start with your action ledger.

The teams that get Tier 2 automation right are not the ones with the most autonomous agent. They are the ones whose escalations are so clean that customers barely notice where the software stopped and the human began. Build that handoff, gate the actions that can hurt you, and let the deflection number take care of itself.