A company I talked to last quarter spent five months building their customer support agent on a managed platform. Worked great in demo. Worked great for the first 200 tickets. Then they needed to add a custom retrieval step that the platform didn't support, and they discovered their entire workflow - every prompt chain, every routing rule, every integration credential - was locked inside a proprietary format they couldn't export.

They rebuilt from scratch on a different stack. Five months of work, gone.

This isn't a rare story. It's the default outcome when teams pick an AI agent platform based on "which one has the coolest demo" instead of "which one won't hold my business logic hostage."

The Real Lock-In Isn't Where You Think

Most platform comparison articles focus on model support. Can it run GPT-4o? Claude? Gemini? That's the wrong question. Model providers have largely converged on compatible APIs, and abstraction libraries like LiteLLM make swapping models a one-line change.

The dangerous lock-in happens one layer up: the orchestration layer. This is where your business logic actually lives - the routing rules, the retry strategies, the prompt chains, the tool-calling sequences. When that logic is encoded in a vendor's proprietary visual builder or domain-specific language, you're not just using a platform. You're married to it.

Chip Huyen nails this in her AI engineering pitfalls post: teams over-invest in frameworks before understanding their actual requirements. The framework becomes load-bearing before anyone realizes the foundation is rented.

Here's the hierarchy of lock-in risk, from lowest to highest:

Layer Lock-In Risk Why
Model provider Low Standard APIs, easy to swap
Vector database Low-Medium Data formats are similar, migration scripts exist
Orchestration/workflow engine High Business logic, prompt chains, routing rules
Credential management High OAuth tokens, API keys, service accounts
Monitoring/observability Medium Logs and traces are somewhat portable

The orchestration layer is where your competitive advantage lives. It's also where vendors have the strongest incentive to make you sticky.

The Four-Axis Evaluation Framework

Stop comparing feature lists. Instead, evaluate every AI agent platform on these four dimensions before writing a single line of code.

Axis 1: Data Portability

Can you get your stuff out?

This sounds basic, but most teams don't test it until they need to leave. The questions to ask:

  • Export format: Are workflows stored as standard JSON, Python, or YAML? Or a proprietary binary format?
  • Version control: Can you check your entire agent configuration into Git?
  • Completeness: Does the export include prompt templates, tool configurations, and routing logic - or just the "nodes"?

Red flag: If the vendor's export feature produces a format that only their import feature can read, that's not portability. That's a zip file with extra steps.

Green flag: n8n stores workflows as JSON that's human-readable and Git-trackable. LangGraph agents are just Python code. Prefect flows are Python decorated functions. These are portable by default because the format is the standard.

Axis 2: Credential Ownership

Who holds the keys?

This is the sleeper issue that bites hardest during migration. When you connect your Slack workspace, CRM, or email provider through a managed platform, those OAuth tokens and API credentials often live in the platform's vault - not yours.

Ask specifically:

  • Do you hold the API keys and OAuth tokens, or does the platform proxy them?
  • Can you export credentials in a format another system can consume?
  • If the platform goes down, can your integrations still function?

Stripe's engineering blog published a detailed analysis of credential delegation patterns that's worth reading. The short version: any system where a third party holds your authentication tokens is a system where that third party can hold your business continuity hostage.

Some platforms handle this well. Beam AI, for example, lets you bring your own credentials for most integrations. Others act as a credential proxy, meaning your tokens are encrypted in their vault and inaccessible outside their system.

Axis 3: Orchestration Coupling

How deep does the vendor's tentacles reach into your logic?

This is the axis that determines your true exit cost. Map every AI agent platform to one of three categories:

Category A - Code-native orchestration: Your agent logic is written in a general-purpose language (Python, TypeScript). The platform provides libraries, not a runtime jail. Examples: LangGraph, CrewAI, raw API calls with your own orchestration.

Category B - Open-format visual orchestration: Visual builder that stores workflows in an open, documented format. You can edit the JSON directly, run it on self-hosted infrastructure, and extend it with code nodes. Example: n8n.

Category C - Proprietary orchestration: Visual builder with a closed format. Workflows only run on the vendor's infrastructure. Logic can't be extracted without reverse engineering. Examples: many "no-code AI agent" startups that launched in 2024-2025.

Category A gives you maximum control but requires engineering investment. Category B is the sweet spot for most teams - visual enough for rapid iteration, open enough to avoid lock-in. Category C is acceptable only for throwaway prototypes.

Martin Fowler's concept of sacrificial architecture applies here: if you know you'll outgrow the platform, design for the migration from day one.

Axis 4: Exit Cost

What does leaving actually cost?

Don't estimate this abstractly. Run the exercise. Pick a real workflow you've built and calculate:

  • Hours to recreate the workflow on a different platform
  • Integrations to re-authenticate (every OAuth flow, every API key rotation)
  • Prompts to re-test (prompt behavior is sensitive to the orchestration context)
  • Downstream systems to update (webhooks, API endpoints, monitoring dashboards)

If the answer is "a week for one workflow," and you have 30 workflows, you're looking at over six months of migration work. That's the real cost of the "free trial" you signed up for.

The Day 90 Test

Here's a concrete exercise I recommend to every client evaluating platforms. I call it the Day 90 Test.

Before you commit to any AI agent platform, build one real workflow on it. Not a demo. A workflow that touches production data, connects to real integrations, and runs on a schedule.

Then, on day 90, try to move it somewhere else. Specifically:

  1. Export the complete workflow including all prompt templates, tool configs, and routing logic
  2. Export or recreate all credentials for connected services
  3. Deploy it on different infrastructure (your own server, a different cloud, a competitor's platform)
  4. Verify it produces identical outputs for the same inputs

If you can do this in under a week with one engineer, the platform passes. If it takes longer, or if steps 1-3 are impossible, you've found your lock-in.

No vendor will volunteer this test. But any vendor confident in their product's value - rather than its stickiness - should welcome it.

Platform Categories: A Buyer's Map

Rather than ranking specific products (those lists are stale before they're published), here's how to think about the current market in categories.

Managed Agent Builders

Examples: Relevance AI, Beam AI, Wordware

Best for: Teams without dedicated AI engineering staff who need agents running within weeks.

Lock-in profile: High orchestration coupling, mixed credential ownership. Workflows typically can't run outside the vendor's infrastructure. Exit cost scales linearly with the number of agents deployed.

The honest tradeoff: You're paying for speed-to-value with long-term flexibility. That's a legitimate choice for internal tools and non-core processes. It's a dangerous choice for agents that are part of your product.

Open-Core Workflow Engines

Examples: n8n, Windmill, Temporal

Best for: Teams with some engineering capacity who want visual building with escape hatches to code.

Lock-in profile: Low. Workflows are JSON or code, self-hostable, and Git-trackable. Credentials stay in your infrastructure. The n8n self-hosting model is representative: you own the data, the config, and the runtime.

The honest tradeoff: More setup time, more infrastructure responsibility. You need someone who can maintain Docker containers and debug workflow execution logs. But your business logic never leaves your control.

Code-First Frameworks

Examples: LangGraph, AutoGen, DSPy

Best for: Engineering teams building agents as a core product capability.

Lock-in profile: Very low at the orchestration layer. Your agents are Python (or TypeScript) code. The risk shifts to framework-specific abstractions - Harrison Chase's team at LangChain addressed this directly by making LangGraph's state management independent of LangChain itself.

The honest tradeoff: Maximum portability, maximum engineering investment. You're building and maintaining the orchestration yourself. This only makes sense when the agent is a core differentiator, not a back-office efficiency tool.

The Decision Heuristic

Ask one question: Is this agent a product feature or an internal tool?

If it's a product feature (customer-facing, revenue-generating, competitively differentiating), use code-first or open-core. You cannot afford to have your product's core logic locked in a vendor's proprietary format.

If it's an internal tool (operations automation, internal support, data processing), managed builders are a reasonable choice. The exit cost is real but bounded - you're optimizing for speed, not sovereignty.

What Good Portability Looks Like in Practice

Let me get specific. Here's what a portable AI agent architecture looks like at a mid-size company running 15-20 automated workflows:

Workflow definitions: Stored as JSON or Python in a Git repository. Every change is a commit. Every deployment is a merge to main. GitOps for AI workflows isn't optional - it's the foundation of portability.

Prompt templates: Separate files, version-controlled alongside the workflows. Not embedded in a vendor's GUI with no export option. The OpenAI practical guide to building agents emphasizes treating prompts as first-class software artifacts. That means they live in your repo, not in a vendor's database.

Credentials: Managed through your own secrets manager (HashiCorp Vault, AWS Secrets Manager, or even encrypted environment files). The workflow engine references secrets by name - it never stores them directly.

Model abstraction: A thin layer that maps model calls to a standard interface. When you need to swap from Claude to GPT-4o for a specific workflow (cost, latency, capability reasons), it's a config change, not a rewrite.

Monitoring: Logs and traces exported to your own observability stack (Datadog, Grafana, even structured log files). Not trapped in the vendor's dashboard.

This setup takes 2-3 weeks longer to establish than signing up for a managed platform. But the first time you need to change a model provider, move to different infrastructure, or debug a production failure at 2 AM, you'll be glad you own the whole stack.

The Six-Month Rework Trap

I keep coming back to that five-month rebuild story because it's not an outlier. I've seen versions of it at three different companies in the last year alone.

The pattern is always the same:

  1. Team evaluates platforms based on features and demos
  2. Team picks the one that looks most polished
  3. Team builds 5-10 workflows over 3 months
  4. Team hits a wall (missing feature, scaling limit, pricing change, acquisition)
  5. Team discovers they can't extract their work
  6. Team rebuilds from scratch

The fix isn't to avoid platforms entirely. The fix is to evaluate platforms on exit cost with the same rigor you evaluate them on entry experience. Run the Day 90 Test. Check the four axes. And if a vendor gets defensive when you ask about portability, that tells you everything you need to know.

Your AI agents are going to be around for years. Pick the platform like you're choosing a foundation, not a demo.