Context engineering for enterprise agents: why prompts are not enough

2026-05-19·2 min read

Most agent projects do not fail because the model is weak. They fail because the system gives the model the wrong context, too much context, stale context, or context that the current user should not have.

That is why I treat context engineering as an architecture problem, not a prompting trick.

What context really means

In an enterprise agent, context is everything the model can use to decide what to say or do:

Task context: what the user is trying to accomplish now.
Business context: rules, policies, product facts, pricing, contracts, and internal definitions.
User context: role, permissions, region, account, current workflow state.
Tool context: what tools exist, what each tool can do, and what side effects it may cause.
Memory context: what should persist across sessions and what must be forgotten.
Evaluation context: what a good answer or action looks like for this workflow.

If these layers are not separated, the system becomes hard to debug and easy to attack.

The production pattern

The pattern I prefer is explicit and boring:

classify the user intent;
retrieve only the minimum relevant knowledge;
filter by permissions before the model sees the data;
inject tool instructions only when the tool is available in the current state;
keep short-term and long-term memory separate;
log the context package that was actually sent to the model;
run evals on context quality, not only on final answers.

This makes the system inspectable. When the agent fails, you can see whether it failed because of retrieval, policy, tool choice, prompt design, model behavior, or missing product logic.

What to audit

When I review an agent architecture, I look for these failure modes:

the prompt contains too much permanent policy text;
RAG returns documents that are relevant but not actionable;
sensitive context is retrieved before access control;
tool descriptions are vague and invite misuse;
memory accumulates unverified facts;
nobody tracks context window cost or contamination;
evals only check tone, not workflow correctness.

The question to ask

Do not ask “how do we write a better prompt?”

Ask:

What exact context should this model receive, from which source, under which permission, for this task, and how do we know it worked?

That question turns agent work from vibe-driven experimentation into an engineering discipline.