EchoMem Research · Dirty Context
What's in the context window?
When you ask Claude Code or Codex to do one small thing, the model does not just read your message. It re-reads the entire worksite — every turn. Here is what is actually inside that window.
A large language model is stateless. It has no memory of the last turn. So an agentic coding tool like Claude Code or Codex fakes memory the only way it can: before every single step, it pastes the whole conversation back into the prompt — your instructions, every file it has opened, every command it has run, every search result, every line of its own reasoning — and sends all of it to the model again.
That pasted-back bundle is the context window. It is not your latest message. It is the entire worksite the model receives on each turn, and it grows with every action the agent takes.
This has a stark consequence. In a real 70-turn Codex session on a web app, the raw ledger summed to 10,911,072 input tokens and just 22,835 output tokens. That is 99.8% input — the model spends almost all of its budget reading the worksite back to itself, and a sliver actually writing code. Across a broader pool of long sessions the figure holds at 99.7%.
What is actually inside
Every turn the model receives is built from the same parts. Some of it is fixed scaffolding. Some of it is the live work. And a surprisingly large share is the accumulated exhaust of everything the agent has already done.
The bar below is the per-turn composition of one real ~110-turn agentic coding window, attributed against ground-truth token counts. Read it as: this is what the model is paying attention to before it does your next task.
- 5.1% System prompt + tool schemas
- 12.0% Live file reads (working set)
- 5.4% Code written this session
- 6.8% Assistant reasoning
- 14.8% Command / test / git output
- 19.7% Search (grep / ripgrep) output
- 7.3% Dead reads (stale / duplicate)
- 1.0% Conversation residue
- 28.0% Serialization + opaque summaries
The thing most people assume fills the window — the actual code the model is reading right now — is only 12%. The live working set is the smallest honest slice of the picture.
The parts, grouped
Base scaffolding (~5%)
The system prompt and tool schemas the harness injects on every turn — roughly 8,400 tokens in this session. Fixed cost, paid forever, but small.
Live work (~24%)
The files currently being edited, the code just written, and the model's own reasoning. This is the part that is genuinely doing the task.
Navigation (~35%)
Search output and command, test, and git results. The agent figuring out where things are — re-fed in full long after the answer was used.
Dead reads + residual (~35%)
Stale or duplicate file reads that were never patched, plus the opaque serialization and compaction summaries the provider counts but does not show.
Why Codex and Claude Code count differently
The two tools bill the same physical window with different math, so the same session can look cheaper or more expensive depending on which ledger you read.
- Codex (OpenAI) reports fresh input as
input_tokens − cached_input_tokens. The re-fed history is cached and discounted, but it is still in the window. - Claude Code (Anthropic) splits it into
cache_read+cache_creation+input. Same idea, different columns.
Either way the worksite is re-sent every turn. Caching changes the price of re-feeding it; it does not change the fact that the model still has to read it.
The window looks different per kind of work
What dominates the context window depends heavily on what the agent is doing. The same tool, pointed at three different jobs, fills its window in three different ways.
| Session | Navigation | Dead reads | What dominates |
|---|---|---|---|
| Code session70 turns · Codex · web app | 35% | 16% | 72% of all file reads were stale by the time they were re-fed. |
| Design session16 turns · screenshot-driven | ~20% | 3% | Image tokens (~30%) dominate and mask the code rot underneath. |
| Long-haul session334 turns · 171 hours | high | rising | 571M input tokens, 92.9% re-fed from cache, for 52 changed files. |
In code work, dead reads run ~5× higher than in image-heavy design work — where screenshot tokens visually dominate and hide the rot underneath.
The window only grows — until it snaps
Because the context is append-only, the window climbs every turn: from ~100k tokens early in a session toward the model's ceiling of ~220–235k. Nothing leaves on its own. Finished subtasks, used search results, and stale file reads all keep riding along.
When it hits the ceiling, the harness compacts — truncating and micro-summarizing the history rather than rewriting it — and the window drops back down, only to climb again. Plotted over a session it makes a sawtooth: grow, snap, grow, snap. Every tooth is the model re-reading work it already did.
Why this is the whole story
When agentic coding feels slow or expensive, the instinct is to blame the model — not smart enough, not enough context. The window says otherwise. The model is not short on context. It is drowning in it: re-reading 170k tokens of history to produce a few hundred new ones, with the genuinely live working set down at 12%.
Input token is not just the user's latest message. It is the whole worksite the model receives every turn.
That is the symptom. In the next note we look at the mechanism — how the useful share of the window decays as the session runs, and why long sessions pay a steeply nonlinear cost for every commit.
The numbers, sourced
Composition is from a residual deep-accounting pass over a ~110-turn session, attributed against ground-truth cache and token counts. Percentages sum to 100 (±0.1 rounding).
Your AI is not slow because it thinks. It is slow because it keeps re-reading.
Echo gives agents a working memory that persists across turns and tools — so the next turn carries only what the current task needs, instead of re-feeding the entire worksite.