A health system in the Midwest signed a contract with an AI customer service vendor last year. The vendor had "HIPAA compliant" on every page of the deck. Six weeks into deployment, the compliance team asked a question the sales engineer could not answer: which subprocessor receives the raw prompt when a patient asks about a lab result, and is there a BAA with them?
The answer turned out to be a tracing platform the AI vendor used for debugging. No BAA. Every patient question for six weeks had been shipped to a third party that was not a covered business associate. The deployment got rolled back. The vendor lost the contract. The compliance lead got a new line item on her resume.
This is the gap the buyer guides do not cover. The marketing tier of HIPAA compliance is easy. The procurement tier is where deals die, and it is also where most AI customer service vendors are unprepared to defend their architecture. This piece is a buyer-side checklist for healthcare ops and compliance leaders evaluating any AI customer service vendor, written from the perspective of the questions that actually catch problems.
The Two Tiers of HIPAA Compliance Most Buyers Confuse
There is "HIPAA compliant" the badge, and "HIPAA compliant" the operational reality. The badge requires a signed Business Associate Agreement, a SOC 2 report, and some boilerplate about encryption. The operational reality requires a vendor that can answer specific questions about how protected health information moves through their system, where it stops, and who else sees it.
The U.S. Department of Health and Human Services publishes the HIPAA Security Rule guidance that defines the technical baseline. Administrative safeguards, physical safeguards, and technical safeguards. Encryption, access controls, audit controls, integrity controls, transmission security. None of that is new. What is new is that AI agents introduce a longer data path than traditional support software, and that path is where the gaps hide.
A traditional support tool processes a ticket through a defined workflow: form submission, database write, agent routing, response. An AI agent expands this into a chain of LLM calls, tool invocations, retrieval steps, and intermediate reasoning. Every hop is a potential PHI exposure point. Every hop needs to be inside your BAA perimeter.
Most vendor security pages list the inputs and outputs. The buyer-side checklist below targets the middle.
The 12-Question BAA and Architecture Checklist
Run this against any AI customer service vendor before you sign. The questions are ordered roughly by how often vendors fail them.
1. Will you sign a BAA, and what does it specifically cover?
Easy yes from most enterprise vendors. The follow-up is harder: does the BAA cover all subprocessors, or only the vendor's first-party infrastructure? Ask for the full subprocessor list with BAA status for each. If they list "AWS" but cannot produce the BAA, that is a stop.
2. Which LLM provider processes PHI, and is there a BAA with that provider?
OpenAI offers BAAs on its enterprise and API platform tiers. Anthropic offers BAAs for Claude on its commercial API. Google Cloud's Vertex AI, AWS Bedrock, and Azure OpenAI all support BAAs under their respective cloud agreements. If your vendor uses any model outside this list for healthcare workloads, ask hard questions about why.
3. Are prompts and outputs excluded from model training?
Table stakes. Every reputable vendor has this. Get it in writing in the BAA, not just the marketing page. The Lorikeet team's overview of AI support for healthcare in 2026 notes that opt-out alone is not sufficient if logging still occurs outside the BAA boundary.
4. Where are tool calls, retrieved context, and intermediate reasoning traces logged?
This is where most vendors stumble. AI agents call tools, retrieve documents, and produce chain-of-thought traces. All of this can contain PHI. If the vendor routes traces to a third-party observability platform without a BAA, that is a breach surface. Ask for the logging architecture diagram. If they cannot produce one in a follow-up call, the answer is no.
5. What is the data flow for a single patient interaction, end to end?
Ask for a diagram before the demo. Where does PHI enter? Where is it parsed, embedded, retrieved, sent to the model, returned, logged, and surfaced to a human agent? The vendor should be able to draw this on a whiteboard in five minutes. The healthcare-focused Iternal AI buyer comparison makes a similar point: the diagram tells you more than the certifications.
6. How is PHI redacted or minimized before reaching the LLM?
The HIPAA Privacy Rule requires "minimum necessary" use of PHI. Strong vendors implement de-identification or field-level redaction before the prompt reaches the model. They use deterministic pre-processing to strip names, MRNs, and dates of birth where they are not needed for the task. Ask which fields the vendor strips, which they pass through, and why.
7. Where is data stored and for how long?
Region matters. Retention matters more. Some vendors retain conversation logs for 30 days for debugging, others for 12 months for product improvement. You want short retention by default, configurable per workspace, with documented deletion guarantees. The Fin AI guide to HIPAA-compliant agents explicitly calls out retention as a frequent gap.
8. Who at the vendor can access patient data, and under what conditions?
Production access to PHI should be break-glass, logged, and reviewable. Ask how many employees have standing access. If the answer is "engineering can query the database," that is the wrong answer. The right answer involves role-based access, ticket-driven access, and an audit trail you can request.
9. What is the audit log surface, and can it be exported?
Your compliance team needs immutable logs of: every PHI access, every agent response, every escalation, every override. You need to be able to export these for your own audit pipeline. Vendors who only offer a UI for log review fail this. Vendors with a CSV export pass minimally. Vendors with a streaming log feed into your SIEM pass cleanly.
10. How do you handle clinical-safety guardrails and escalation?
This is the soft compliance question that has hard regulatory implications. An AI agent that confidently answers a question it should escalate is a patient-safety incident waiting to be litigated. Ask about: refusal patterns, escalation triggers, human-in-the-loop checkpoints for medication, dosage, or symptom questions, and how the vendor measures false-confidence rates. Anthropic's research on building effective agents is a useful reference for what a well-designed escalation path looks like.
11. What does your incident response look like for a suspected PHI exposure?
HIPAA's Breach Notification Rule gives you 60 days to notify affected individuals. Your vendor needs to notify you fast enough to meet that clock. Ask for the SLA on incident notification. Ask for a sample incident report from a prior event. If they have never had an incident, ask how they would handle one tomorrow.
12. Can you provide a SOC 2 Type II report and HITRUST certification?
SOC 2 Type II is the floor. HITRUST is the ceiling. Most strong healthcare AI vendors carry both, or are on a documented roadmap to HITRUST. The Comm100 review of HIPAA-compliant support solutions notes that HITRUST is increasingly a contractual requirement at large health systems, not a nice-to-have.
The Subprocessor Problem Nobody Wants to Discuss
The cleanest way to fail a HIPAA audit in 2026 is to ship PHI to a subprocessor without a BAA. The reason this keeps happening with AI vendors is architectural: a modern AI agent stack typically includes a model provider, a vector database, an embedding API, an observability platform, an evaluation harness, and sometimes a separate tool execution sandbox. Every one of these is a potential subprocessor.
The Prosper AI 2026 framework guide walks through this dependency tree well. A short summary of what to ask for:
- The full subprocessor list with BAA status for each
- The data each subprocessor receives (full prompt, redacted prompt, metadata only)
- The retention policy at each subprocessor
- Whether the vendor can disable specific subprocessors per customer
A vendor that cannot answer the last question is probably running a single shared infrastructure for all customers. That is fine for most B2B SaaS. For healthcare, you want the ability to disable, swap, or isolate components per the requirements of your compliance program.
Voice AI Has Its Own Set of Traps
Voice adds two complications. First, audio recordings are PHI when they contain identifiable health information, and they often do by the second sentence. Second, voice systems typically involve a speech-to-text vendor, an LLM, and a text-to-speech vendor, each of which is a separate subprocessor.
The Greetmate guide to HIPAA-compliant voice AI agents covers this in detail. The short version: ask whether audio is stored, where transcripts are stored, and whether the transcription happens inside the BAA perimeter or via an external API. Many voice AI vendors use third-party transcription services without BAAs. That is disqualifying for healthcare.
What Good Looks Like: The Buyer's Mental Model
A useful frame for evaluating any AI customer service vendor in healthcare is the four-layer test:
| Layer | What to verify | Common failure |
|---|---|---|
| Contractual | Signed BAA covering all subprocessors | BAA only covers vendor, not chain |
| Technical | Encryption, access controls, audit logs | Logs accessible to engineering by default |
| Operational | Incident response, breach notification SLA | No documented runbook |
| Architectural | Clear data flow, redaction, model isolation | Traces routed to third-party platforms |
A vendor that passes the first three but fails the fourth is the most common shape of risk. They look compliant on paper, they have the certifications, but the architecture has hidden subprocessors and hidden data paths. The fourth layer is what the 12-question checklist above is designed to test.
The BlockSurvey roundup of HIPAA-compliant AI tools is useful as a starting list of vendors, but a list is not a procurement decision. The decision comes from running each candidate through the architectural layer.
Build, Buy, or Customize
For healthcare buyers, the build-versus-buy decision has a HIPAA twist. Off-the-shelf platforms get you to a signed BAA quickly. They also lock you into the vendor's choice of subprocessors, their retention policies, and their observability stack. If any of those choices violate your internal compliance program, you have limited recourse.
Custom-built agents on infrastructure you control (your AWS account, your Azure tenant, your model provider relationships) give you the ability to enforce your compliance program end to end. The trade is implementation time and engineering ownership.
A reasonable rule of thumb:
- Choose a platform when your use case is generic (appointment reminders, basic intake, FAQ responses) and the platform's subprocessor list matches your compliance posture.
- Choose custom when your workflow involves clinical-decision support, multi-step PHI handling, or integration with EHRs and you need full audit control.
- Wait when your compliance program is being rewritten or you are mid-cycle on a HITRUST renewal. Locking in an AI vendor mid-audit is a procurement mistake.
How OpenNash CX Can Help
OpenNash CX builds custom AI customer service agents for regulated workflows, including healthcare. The compliance posture is the design constraint, not an afterthought. That means:
- Deployment into your cloud account (AWS, Azure, GCP) so PHI never leaves your BAA perimeter without explicit configuration.
- Model-provider relationships that already include BAAs with OpenAI, Anthropic, and major cloud-hosted model providers.
- Observability and evaluation tooling that runs inside the customer's environment, not on a shared third-party platform.
- Audit-grade logging for every PHI access, every model call, every escalation.
- Senior-led implementation with documented data flow diagrams from day one.
If you are mid-evaluation on an AI customer service vendor and the 12-question checklist has surfaced gaps, book a call to walk through the architectural patterns that close them. The point is not to displace every off-the-shelf platform. The point is to give compliance and operations leaders a clear path when the platform fails the checklist.
The Next Step
Run the 12 questions in your next vendor call. Write the answers down. Compare them across vendors. The shortlist that emerges will be shorter than the one you walked in with, which is the entire purpose of the exercise.
Healthcare buyers are not asking for the most capable AI agent. They are asking for the most capable AI agent that will not cost them a breach notification, a settlement, or a board-level incident. Those are different questions, and the second one is the one most vendor decks refuse to answer.