What is context engineering for AI agents?

Context engineering is the practice of delivering the right business information to an AI agent at the right moment, from the right source, with the right permissions. It covers where data lives, how fresh it is, how the agent retrieves it, and how the agent writes results back. It is the production discipline that sits underneath the prompt.

How is context engineering different from prompt engineering?

Prompt engineering shapes how you ask the model to reason. Context engineering shapes what the model is allowed to see and do. You can write a perfect prompt and still get a wrong answer if the agent reads stale order data or cannot reach the system of record. In production, context decides reliability more than wording does.

What data does an AI agent actually need?

It depends on the job, but most business agents pull from a mix of structured systems (CRM, ERP, billing, data warehouse) and unstructured sources (email, support tickets, documents, wikis). The work is deciding which source is authoritative for each fact, how current it needs to be, and which records the agent is permitted to touch.

Why do AI agents fail in production after a good demo?

Demos run on clean, recent, single-source data. Production runs on conflicting records, stale syncs, and permission boundaries. The model behaves identically in both, so the failure shows up in the data layer: wrong source, old data, or a record the agent could not reach. Fixing the prompt rarely helps.

How should AI agents handle permissions?

An agent should act with scoped, auditable permissions, never a blanket admin token. Decide whose access it inherits, enforce row-level and record-level limits at the data source, and log every read and write. Treat writeback as a privileged action that may require a human approval gate for high-impact changes.

Data and Context Engineering for AI Agents: The Work Before the Workflow Works

A refund agent we reviewed last quarter handled its demo perfectly. A customer asked for a refund, the agent pulled up the order, checked the return window, issued the credit, and posted a clean confirmation. Two weeks into production it started telling customers their orders did not exist.

Nothing was wrong with the model. The order data lived in three systems. The agent was wired to the one that synced overnight, while the freshest record sat in a checkout service it had never been connected to. New orders were invisible to it for up to twenty-four hours. The demo used a week-old test account, so the gap never showed. The model did exactly what it was told, against data that was quietly wrong.

That is the pattern behind most agent disappointments. The reasoning works. The plumbing does not. Before a workflow can work, someone has to do the unglamorous job of getting the right business context to the agent at the right moment - and writing its results back without breaking anything.

The Model Was Never the Problem

There is a reflex, when an agent gives a bad answer, to reach for a bigger model or a cleverer prompt. Sometimes that is the fix. More often the agent reasoned correctly over the wrong inputs.

This is why "context engineering" has become its own discipline rather than a footnote in prompt design. Atlan's 2026 guide frames it as the work of assembling and delivering the information a model needs to act, as opposed to just phrasing the request well. Analysts have started naming it directly. Gartner expects context engineering to become a dominant practice in enterprise AI tooling by 2028, and Salesforce's 2026 trends roundup lists context grounding as one of the defining shifts of the year.

The split that matters in practice is between two kinds of context:

Structured context lives in databases, CRMs, ERPs, billing systems, and warehouses. It is queryable, typed, and usually has a clear owner. "What is this customer's plan tier?" has one correct answer in one place.
Unstructured context lives in email threads, support tickets, PDFs, contracts, wikis, and call transcripts. It is rich but messy. "Why is this customer unhappy?" has no single field to read.

Production agents need both, and they need them stitched together. The structured side tells the agent what is true. The unstructured side tells it what is going on. Get the join wrong and the agent sounds confident while being useless.

The takeaway: treat a wrong answer as a data question first and a model question second.

Map the Sources Before You Map the Workflow

Most teams open a workflow builder and start dragging nodes. The better first move is to inventory every place business context actually lives, and to decide which source is authoritative for each fact the agent will rely on.

Here is the working map we use for a typical mid-market or PE-backed operation:

Source type	System	What it answers	Structured?
Customer	CRM (Salesforce, HubSpot)	Who is this, what stage, what history	Yes
Transactions	ERP / billing	Orders, invoices, payments, entitlements	Yes
Support	Helpdesk (Zendesk, Intercom)	Open issues, prior resolutions	Mixed
Communication	Inbox / Slack	Latest human context, commitments made	No
Knowledge	Docs, wiki, contracts	Policies, procedures, terms	No
Analytics	Data warehouse	Aggregates, segments, history	Yes
Internal tools	Custom apps, scripts	Operational state	Varies

The discipline is naming the system of record for each fact, because the same fact often appears in several systems with different values. Glean's analysis of full context in workflows makes the point that context fragmented across tools is the normal enterprise condition, not the exception. An agent that reads "account status" from the CRM when the billing system is authoritative will be wrong every time the two disagree, which in a real company is often.

We wrote a full piece on this discipline in Find the System of Record. For agents, the rule sharpens: every field the agent reads should have exactly one source it trusts, and every other copy is a cache it should distrust.

The takeaway: you cannot design a reliable retrieval path until you know which system owns each fact.

Freshness Is a Feature, Not an Afterthought

Once you know where data lives, the next question is how old it is allowed to be. This is where the refund agent died.

Freshness is not one setting. It is a per-source decision. A pricing policy that changes quarterly can be cached for a day without harm. An order status changes by the minute, and an agent reading a nightly sync will hand customers stale truth. The mistake is applying one freshness assumption to every source because that is how the nightly ETL job already runs.

Materialize's work on real-world context engineering lands on this directly: agents acting on operational decisions need operational freshness, which batch pipelines built for dashboards rarely provide. A report can be a day late. An agent issuing credits or rescheduling deliveries cannot.

Practical guidance per fact type:

Reference data (policies, product catalogs, terms): cache aggressively, refresh on a schedule.
State data (order status, ticket status, inventory): read live from the system of record, or use change-data-capture so the agent sees updates within seconds.
Derived data (segments, scores, aggregates): accept staleness, but tell the agent the as-of timestamp so it can hedge.

That last point is underrated. When an agent knows a number is "as of last night," it can say so instead of asserting it as current. Passing freshness metadata alongside the value is cheap and it changes how the model talks to your customers.

The takeaway: set freshness per source, and make staleness visible to the agent rather than hidden.

Permissions and Retrieval Paths

An agent is a new actor in your systems, and the first question security should ask is: whose access does it have?

The lazy answer is a single service account with broad read and write rights. That works in a demo and is a liability in production. If the agent can read every customer record and is exposed to untrusted input (a customer message, a forwarded email), it becomes a path to data it should never surface. The Codingscape guide to building production agents spends most of its length on exactly this: scoping access, enforcing limits at the data layer, and not handing an autonomous process the keys to your database.

The design moves that hold up:

Scope to the task. A support agent needs the requesting customer's records, not the whole table. Enforce row-level security at the source, not in the prompt, because prompt-level rules are suggestions and database rules are walls.
Inherit a real identity. Where possible, the agent should act on behalf of a specific user or role with that user's permissions, so the access boundary already exists.
Make retrieval explicit. Define which sources the agent may query for which intents. An agent that can call any tool for any reason is harder to reason about and easier to exploit.

Retrieval path design also shapes quality, not just safety. An agent that retrieves from five systems on every request is slow and noisy. One that retrieves the wrong system is wrong. The patterns we keep returning to are covered in agentic knowledge base patterns, where the retrieval layer becomes a routing decision rather than a blind vector search.

The takeaway: permissions belong at the data source, and retrieval should be a deliberate route, not a free-for-all.

Writeback: The Half Everyone Forgets

Read-only agents are easy to feel good about. They summarize, they draft, they answer. They are also, mostly, expensive research assistants. The value shows up when the agent updates the CRM, posts the refund, files the ticket, books the meeting - when it writes.

Writeback is where the engineering gets serious, because a wrong read is embarrassing and a wrong write is damage. Three controls separate a worker from a hazard:

Idempotency. If the agent retries after a timeout, it must not issue the refund twice. Design writes so repeating them is harmless, usually with an operation key the target system deduplicates on.
Approval gates. High-impact actions (anything touching money, contracts, or external customers) route through a human until the agent has earned trust on that action class. This is not a permanent crutch; it is how you ship before you have months of evidence.
Audit logs. Every write records what changed, why, and on whose authority. When something goes wrong at 2am, the trace is the difference between a five-minute fix and a forensic project.

Observability is the connective tissue here. LangSmith's observability docs show why you want every retrieval and every write captured as a trace: when an agent does the wrong thing, you need to see which context it read and which action it took, not guess. The agents that survive contact with production are the ones whose every decision can be replayed.

This is also where memory and writeback meet. An agent that writes its own observations back into a store it later reads is building state, and that state needs the same freshness and permission discipline as everything else. We go deeper on that in AI agent memory architectures.

The takeaway: an agent that cannot write safely is a draft generator. The writeback controls are what make it a coworker.

A Context Readiness Check

Before building, run the workflow through five questions. If you cannot answer them, you are not ready to draw nodes.

Sources. For every fact the agent needs, what is the single system of record?
Freshness. How current must each fact be, and does the current pipeline deliver that?
Permissions. Whose access does the agent inherit, and is it enforced at the data layer?
Retrieval. Which sources does the agent query for which intent, and how does it avoid pulling the wrong one?
Writeback. What can the agent change, what requires approval, and is every write idempotent and logged?

This sequence is the difference between an agent that demos well and one that runs. The same read-reason-write loop, with freshness on the way in and audit on the way out, is what the diagram above maps. Jon Radoff's state of AI agents in 2026 argues that the frontier this year is less about model capability and more about the surrounding systems, and this checklist is that argument made concrete. The work is in the wiring.

For PE-backed and mid-market operators specifically, WorkWise's 2026 guide to AI agents in private equity shows the same constraint: the value cases (diligence, portfolio monitoring, deal sourcing) all depend on agents reaching fragmented data across portfolio companies. The fragmentation is the project. The agent is the easy part.

How OpenNash Can Help

Our team spent years on the data side of this problem before agents were the headline - building on warehouses like Snowflake and orchestration with the LangChain stack. That order matters. We came to agents from the data layer, which is why we treat context engineering as the first deliverable, not a cleanup phase.

A typical engagement runs in the same order as this article:

Audit. We map your sources, name the system of record for each fact, and find the freshness and permission gaps that will bite in production.
Design. We define retrieval paths, approval gates, and writeback controls before any agent reasons over live data.
Build. We implement against your real systems, with observability wired in so every read and write is traceable.
Deploy and own. You get the working system, the documentation, and full ownership. No black box, no lock-in.

If you are weighing a platform instead, that can be the right call: if your data already lives in one suite and your use case fits a vendor's template, buy the platform. Custom is worth it when your context is fragmented across systems, your workflows are specific, and you need to own the result. And if your data is not yet trustworthy, the honest answer is to fix the data layer before you build the agent at all.

We made a longer version of that ownership argument in Open AI Strategy: why model routing, traces, and memory beat vendor lock-in.

Book a call to map this context layer to your workflow before you build the agent on top of it.

The agent is the part everyone wants to talk about. The reason it works, or does not, is the week of data engineering nobody put on the slide.