Claude Code feels like an ongoing conversation, but each model call is a stateless request. The model does not remember the previous turn by itself. Claude Code has to rebuild the working session by sending the current message together with the system prompt, tool schemas, active skills, memory, conversation history, file reads, and tool results.
The practical consequence is simple: context management is not a nicety around the model. It is the mechanism that gives the model working memory. When that memory is clean, the model can stay sharp. When it is noisy, stale, or overstuffed, you get drift.
The context window is the only memory that matters at inference time
The server handling a Claude request has no hidden memory of the last request. It does not privately remember your repository, your previous constraints, or the decision you made thirty messages ago. The model receives a payload, computes a response, and the request state is discarded.
Claude Code creates continuity by constructing that payload on every turn. It includes the conversation history and the surrounding operating context so the model can behave as if it has been following along. That design is powerful, but it means every included token competes for attention. A stale tool result, a large pasted log, or an overgrown instruction file is not inert history. It is live input.
This is the key mental model: the context window is not a diary. It is a working set. Treat it like memory pressure in a running system, not like an archive you can append to forever.
What you are spending on each turn
Context cost is easy to underestimate because the expensive pieces are not always visible in the chat. Your message may be short, but the request can still be large because tool schemas, project instructions, skill instructions, conversation history, and prior reads are traveling with it.
Base operating instructions for Claude Code. Always present.
~6kJSON schemas for available tools. Loaded so the model knows what it can call.
12-70kWorkflow instructions and specialist guidance active in the session.
~3kProject and user guidance. Useful when lean, expensive when used as a knowledge dump.
<1k-10k+All prior messages reconstructed every turn.
GrowsA 500-line file can cost roughly 3-4k tokens and stays in session context.
3-4k/fileA full dashboard JSON, log dump, or MCP response can swamp the useful signal.
10-15k+The main surprise is schema overhead. If three MCP servers are connected, the model may receive tens of thousands of tokens of tool descriptions before your actual task content. That overhead is paid whether or not you call those tools. Disabling a server you will not use in the current session is the same kind of optimization as trimming CLAUDE.md: less always-on input for the model to carry.
/context is the diagnostic command here. It lets you inspect token counts by category, which tools are loaded, and which skills are active. Run it before a long task, not only after the session starts behaving strangely.
Bigger windows reduce hard limits, not attention costs
A larger context window is genuinely useful. It lets you load more source, inspect larger document sets, and keep complex work alive longer. But it does not make the model equally sensitive to every token. Long-context models can still under-attend to information buried in the middle of a large prompt, and reasoning quality can degrade before the request hits the formal limit.
That is why "we have a 1M-token window" is not a reason to stop curating. At 100k tokens, context hygiene is about staying focused. At 600k tokens, it is about preventing the model from having to search a haystack every time it answers. More window gives you more room to operate; it does not turn a messy working set into a clean one.
The right question before reading another file or running another broad query is not "might this be useful someday?" It is "does this belong in the model's working memory for the next decision?"
What context rot looks like
Claude has no awareness of what it costs to keep things around. It cannot see how full the window is, decide to forget an old tool result, or flag that a file from three tasks ago is still consuming attention. It reasons from whatever is in front of it, all of it, every turn.
The symptoms usually look like ordinary quality loss rather than an explicit failure:
- It contradicts decisions made earlier in the session.
- It misses constraints that are technically still present but buried.
- Answers get longer, less decisive, and more hedged.
- It starts recapping context instead of acting on it.
None of this throws an error. By the time it is obvious, the session has often been producing slightly worse output for a while.
If you were starting fresh right now, what would you actually bring? The difficulty of cutting is not evidence that cutting is wrong.
Compaction is useful, but it is not a strategy
Auto-compaction compresses earlier turns into a summary and discards the original content. The session can continue, but the model is no longer reasoning from the exact decisions, constraints, and evidence. It is reasoning from a lossy map of them.
Manual /compact is better because you choose when the compression happens. A clean, intentional compact is much more likely to preserve the right state than an automatic one at the edge of the window. Still, compaction should be treated as a recovery tool, not an excuse to run every session until it decays.
The healthier pattern is to exit before the context becomes expensive to salvage. Use /rename early so the session is findable. Use /handoff before closing long-running work so the durable state is captured: what changed, what is in progress, what remains, and what decisions are open. Then use /pickup to resume from a clean session with the right memory and without the accumulated worklog.
Keep durable knowledge out of the hot path
The same sunk-cost problem shows up in instruction files. It is tempting to put every runbook, design note, preference, and historical decision into CLAUDE.md so the model always has it. But always available also means always paid for. A 10k-token CLAUDE.md is 10k tokens on every turn, even when the current task has nothing to do with most of it.
CLAUDE.md should contain the rules needed for correct behavior in this repository: conventions, commands, constraints, and short pointers to where deeper material lives. Long-lived knowledge belongs in separate memory files or docs that can be loaded when relevant.
- Keep CLAUDE.md lean. It should be project-specific operating guidance, not a personal knowledge base.
- Use subagents for heavy exploration. Large code searches, incident investigations, and document summaries should happen in isolated windows that return only the result.
- Disable unused MCP servers. Tool schemas load whether you call the tools or not.
- Ask before loading more. A file read or tool call should answer the next decision, not satisfy vague curiosity.
Claude Code memory can live under ~/.claude/projects/<project>/memory/. The index stays available; per-topic files load when referenced. That is the difference between keeping every book on your desk and knowing which shelf they are on.
Use isolation before cleanup
Some work is inherently messy: searching a large codebase, reading several logs, comparing many files, or summarizing a document set. If you run all of that in the main session, every intermediate result lands in the context and stays there. By the time you get the answer, the scaffolding used to produce it may cost more than the answer itself.
Subagents change the shape of that cost. The expensive exploration happens in another context window, and the main session receives a focused result. An investigation that would add 50k tokens of reads, failed paths, and intermediate reasoning might come back as a 500-token summary. That is not only cheaper; it keeps the main thread more coherent.
If context has already gone wrong, use /rewind or /branch early. Rewinding three bad turns is cheap. Letting fifteen more turns accumulate on top of a bad tool result usually means the better move is to clear, hand off what matters, and restart cleanly.
Command reference
Use the smallest reset that removes the bad context while preserving the state you still need.
Inspect token counts, loaded tools, active skills, and overhead before a long task.
Do not use mid-task unless you are ready to lose the active thread.
Manual is better than automatic, but still lossy.
Discard bad turns while preserving earlier work.
Fork from the same point instead of discarding.
Make the session findable before you need to resume or hand it off.
Use for short pauses when the live session is still valuable.
Save durable state before closing.
Start clean with the necessary memory.
Return only the result to the main context.
The habit
You cannot fix a bad context with a better prompt. If the working set is full of stale files, huge tool results, irrelevant instructions, and half-finished paths, the model still has to reason through them. Context hygiene is something you practice before and during the task, not something you reach for only when the answer starts looking wrong.
Mop first, prompt later: start with the cleanest context that can solve the current problem, isolate expensive exploration, and carry durable state forward deliberately when a session has run its course.