A friend running data infrastructure at a Series C company told me they had finally picked their AI vendor. Six months later, they had eighty agents, three thousand prompts, a memory store they could not export, and an eval suite that only ran inside the vendor's playground. When the vendor raised prices by forty percent at renewal, the procurement conversation lasted four minutes. They paid.

That is the story buyers keep telling themselves is about models. It is not. It is about everything around the model: the prompts, the tools, the traces, the memory, the evals, the approval logic, the integration glue. The model is the easy part to swap. The operating layer around it is what holds you hostage.

This piece is about building the other version - the version where vendors compete for workload, where switching is a normal Tuesday operation, and where the durable assets of your AI program belong to you.

The Real Shape of AI Vendor Lock-In

Vendor lock-in in AI does not look like classic lock-in. There is no proprietary database format, no five-year contract, no migration project measured in years. The API call looks almost identical across providers. That is the trap.

The lock-in lives in the surface area around the model. The Register documented enterprise teams reporting that switching providers, in practice, requires rebuilding prompts, re-tuning agents, re-collecting evaluation data, and recreating tool definitions, even when the underlying API surface is similar. The reported budget pain is not the model bill. It is the work to leave.

A clearer way to see it: every AI program accumulates seven assets over time.

  1. Prompts and prompt versions
  2. Tool definitions and schemas
  3. Evaluation suites and labeled data
  4. Production traces and execution logs
  5. Long-term memory and retrieved context
  6. Agent and workflow definitions
  7. Integrations with internal systems

If five or more of those live exclusively inside one vendor's tooling, you are locked in regardless of whether you can technically call a different API. The Open Markets Institute has argued that the larger AI platforms are explicitly designed to make these assets non-portable, which is a structural concern beyond any single procurement decision.

The buyer mistake is treating model choice as the lock-in question. Model choice is the cheapest thing to change. The harness around it is the expensive one.

What Open AI Strategy Actually Means

An open AI strategy is not "use open-source models." Open-weight models are useful, but a closed harness with open weights still locks you in. A better definition:

An open AI strategy is one where the model is swappable, but the business logic, evaluations, memory, traces, integrations, and operating surface are owned by the company.

This framing matches what Swfte's enterprise teardown of vendor lock-in and Sparkco's enterprise guide both push toward: the unit of ownership is the system around the model, not the model itself.

Three layers belong to the company, not the vendor:

The Harness. The execution surface where agents run. Prompts, tool calls, retries, approvals, guardrails, memory reads and writes. The harness is where your business rules live. Anthropic's own guidance on building effective agents hints at this without saying it directly: the patterns they recommend - routing, orchestrator-workers, evaluator-optimizer - are harness decisions, not model decisions.

The Observability Plane. Traces, evals, error analysis, regression tests. Hamel Husain's evals FAQ is unambiguous on this point: evaluation is the most important activity in any AI program, and generic vendor metrics rarely capture the failure modes that matter to your business. If your eval suite cannot run independent of a provider, you cannot honestly compare providers.

The Memory and Data Layer. Long-term memory, retrieval indexes, customer context, and the records of what the agent did and why. If this layer is inside a vendor's managed memory product, you have outsourced the part of the system that gets more valuable over time. That is the worst trade in the stack.

The model? The model is a contractor. It does the work for a per-token rate. It should be swappable on a per-workload basis, not a per-vendor basis.

Three Architectures, Three Levels of Leverage

Here is how the three common patterns compare in practice.

Architecture What You Own What the Vendor Owns Switching Cost Best For
Single-vendor stack API calls, some glue code Prompts, evals, traces, memory, agent definitions, tool registry High - effectively a rebuild First pilot, demo-to-production sprint, teams with no AI platform expertise
Routing layer only Routing logic, gateway config Evals, traces, memory, agent harness, tool registry Medium-high - you can change models but not behavior Cost optimization, basic failover, teams already locked into a harness
Owned harness + routing Harness, evals, traces, memory, tools, integrations, agent definitions Inference only Low - model swap is a config change Production systems, regulated industries, multi-year programs

A routing layer alone is the most common false sense of security. Teams stand up LiteLLM or a similar gateway, route 70% of traffic to a cheaper model, and call it freedom. It is not. The moment a behavior regresses after a model update, you discover the regression in production because your evals live somewhere else. The moment you want to move agent state from one provider to another, you discover memory was never portable. The gateway solved the easy half.

The harder half is the harness. Build the harness once, route to many models, and the leverage flips. Vendors compete for your workloads. Pricing power moves to the buyer. New models become an A/B test, not a migration project. This is the position Truefoundry's analysis of model gateways describes as the realistic end state for enterprise buyers in 2026.

The Four Assets Worth Owning

If you build only four things in-house, build these.

1. A portable trace and eval store.

Every agent run produces a trace: the inputs, the tool calls, the model outputs, the final decision, the latency, the cost. Store these in your own warehouse. Tag them with the model version, prompt version, and tool version that produced them. This is the single most leveraged piece of infrastructure in the program. It enables regression testing, error analysis, and honest cross-vendor comparison. LangSmith's observability documentation describes the shape of what you need, but the principle holds regardless of tool: traces belong in your data plane, not a vendor's.

2. A memory layer separate from any model provider.

Long-term memory - customer preferences, prior conversations, learned procedures - is your accumulating moat. If it lives in a managed memory product tied to one provider, the moat belongs to the provider. Build it on infrastructure you already run. Postgres with pgvector handles the majority of production cases. The schema and the data are yours.

3. Versioned prompts and tool definitions.

Prompts and tool schemas are code. They belong in a repository, with versioning, code review, and tests. Treat any prompt that lives only in a vendor playground as untracked production code, because that is what it is.

4. A model gateway with per-workload routing.

Not for the cost savings, though those are real. For the optionality. The gateway is the seam that lets you A/B test a new model against production traffic without rewriting the harness. It is also where you implement fallback, rate limiting, and PII redaction in one place. Chip Huyen's pitfalls list warns against premature framework adoption, and that warning applies here: the gateway should be the simplest layer that does these jobs, not the most ambitious one.

These four assets do not require a moonshot. They require a deliberate decision to own them on day one, before the vendor's tooling absorbs them by default.

When Standardizing on One Vendor Is the Right Call

Open strategy is not a religion. There are real cases where picking one vendor and shipping fast is the correct move.

Pilots with a six-week deadline. The first agent inside a company with no AI engineering bench. A regulated workflow where the vendor's compliance posture is the gating factor. A team that needs to prove value before negotiating headcount for a platform layer.

The trap is not picking a vendor. The trap is letting the pilot's architecture become the company's architecture by default. The USCC report on China's AI strategy makes a related point at the national level: short-term efficiency from closed stacks compounds into long-term structural dependence. The same logic applies to a Series B startup choosing its first agent platform.

Two rules keep the pilot from becoming the cage:

  • Decide upfront which of the seven assets you will own from day one. Even if the pilot uses the vendor's tools, mirror prompts and traces into your own systems from week one.
  • Set a review checkpoint at the point where the program crosses ten agents or three workloads. That is the moment to ask whether the harness should move to a neutral surface.

This is the fair version of the recommendation. Single-vendor is fastest. It is also the version with the worst renewal leverage. Plan the exit before you sign the entry.

How Open Strategy Changes Your Procurement Conversation

The practical test of an open AI strategy is what your next renewal looks like.

In a locked stack, renewal is a budget conversation with one outcome: pay more. In an open stack, renewal is a workload review. Which workloads is this vendor still the best fit for? Which have moved to a cheaper or better model? What share of total tokens still flows here? The vendor knows you can move and prices accordingly.

A second-order effect: your engineering culture changes. Teams stop arguing about which model is best in the abstract and start arguing about which model is best for this specific workload, with this specific eval suite, on this specific cost ceiling. That argument is productive. The abstract one is not.

A third effect: new model releases stop being disruptive. When a frontier model ships, you A/B test it against your existing evals on a slice of production traffic. If it wins, you raise its weight in the router. If it does not, you ignore it. The model release calendar stops dictating your roadmap.

How OpenNash Can Help

OpenNash builds model-agnostic AI systems where vendors compete for workload. The deliverable is not a model choice. It is the harness around the model: prompts as versioned code, traces and evals in your warehouse, memory in infrastructure you already run, a gateway that lets you swap providers per workflow, and audit logs that survive any vendor change.

A typical engagement maps to this article's framework:

  • Audit. Inventory the seven assets across your current AI program. Find where each one lives and what it would cost to move.
  • Design. Define the harness, the observability plane, and the memory layer. Pick the model gateway and the eval framework. Decide which workloads stay single-vendor and which move to routing.
  • Build. Implement the harness, migrate the assets that are worth migrating, and stand up the eval suite before the first production workload moves.
  • Deploy. Hand off the system with full ownership. Documentation, runbooks, CI for prompts and tools, on-call structure for production agents.

The honest version of the pitch: if your AI program is two pilots and a Slack bot, you do not need this yet. If it is ten agents, three integrations, and a renewal conversation coming up, the math has already turned.

Book a call to map this framework to your workflow. We will look at your current stack, identify where switching cost is accumulating fastest, and tell you honestly whether owning the harness is worth it for your stage.

The model is the spice. Useful, powerful, sometimes expensive. The question is who controls where the spice flows. Build the harness, own the traces, keep the memory, and the answer is you.