Context Engineering Explained: How Compaction Enhances AI Model Performance

Everyone talks about prompts. Almost nobody draws the thing that actually reaches the model. The prompt is one input among several. Context engineering is the discipline of deciding what gets assembled into the window before a single token is generated — and, just as importantly, what gets thrown away. Here is the drawing.

Three sources feed an assembly stage with a fixed budget. The dashed loop — deciding what to keep and what to compact — is the part teams leave off the whiteboard.

The three sources are the easy part.

Retrieval pulls documents. Tools return live results. Memory carries prior turns and durable facts about the user. These are the boxes people draw confidently, because they map onto systems engineers already understand — a search index, an API, a database. The arrows feel obvious.

The context window is a budget, not a bucket.

The window has a fixed size. Every token from every source competes for the same finite room. When the three sources together produce more than fits — and they almost always do — something has to be cut. The decision of what to cut is not a side effect. It is the central act of context engineering, and it happens here, at assembly, before the model sees anything.

The part everyone forgets to draw.

Look at the dashed orange loop. That is compaction — the policy that decides what gets summarized, demoted, or dropped as the window fills. Teams draw retrieval, tools, memory, and the model, then wire them straight through and wonder why quality degrades over a long session. The honest answer is that they never designed the discard.

The model only ever sees what the assembly stage decided to keep. Context engineering is mostly the engineering of what to leave out.

Why this matters in production.

A retrieval bug shows up as a missing fact. A compaction bug shows up as a model that was right an hour ago and is confidently wrong now — much harder to catch.
We treat the discard policy as code, with its own evals, not as an afterthought tuned by feel.
Boring wins here too. A predictable, well-tested compaction step beats a clever one nobody can reason about at 2am.

— Silicon Prime team. Walnut Creek, CA. June 2026.

How context engineering actually works.

The three sources are the easy part.

The context window is a budget, not a bucket.

The part everyone forgets to draw.

Why this matters in production.

Comments