In EP.2, I used hooks to cut what Claude reads. Test results showed only failures, build logs only errors.
I decided to measure how much I’d actually saved.
Measure first
Claude Code logs every conversation as JSONL files. They stack up per project under ~/.claude/projects/, each entry containing token usage. I built a script (analyze-tokens.js) that reads these, cross-references with daily session files, tallies turns, cache_read/write/output separately, and converts everything to cost.
While building it, I caught a bug. The session recording hook was referencing e.usage, but the JSONL structure was e.message.usage. Tokens weren’t being logged to session files at all. The analyzer bypassed this by reading JSONL directly.
Here’s one day’s cost — March 27.
58% of total cost is reading
| Item | Cost | Share |
|---|---|---|
| cache_read | $19.22 | 58% |
| cache_write | ~$9.62 | 29% |
| output | ~$4.33 | 13% |
| Total | $33.17 |
cache_read is 58%. More than half of Claude’s cost comes from “reading.” In EP.1 I wrote “most of what Claude consumes is reading, not reasoning.” Seeing the numbers makes it certain.
What is it reading?
I tracked what Claude loads every turn.
| Item | When loaded | Size |
|---|---|---|
| Project CLAUDE.md | Every turn | 155–214 lines |
| Root CLAUDE.md | Every turn | 10 lines |
| MEMORY.md | Every turn | Under 23 lines |
| Conversation history | Every turn | Cumulative |
There was a surprising discovery. SKILL.md (469 lines) and agent docs (1,233 lines) don’t load every turn. They load only when a skill is invoked, only when an agent runs. Already lazy loading.
The only things loaded every turn are CLAUDE.md and MEMORY.md.
Splitting CLAUDE.md into rules/
Just because CLAUDE.md is 155–214 lines doesn’t mean all of it is needed every turn. The artifact path table is only needed when writing artifacts. Game domain rules are only needed for game-related work.
Claude Code has a .claude/rules/ directory. Create a file there with paths: frontmatter, and it loads only when accessing that path.
---
paths: tasks/**
---
Artifact path table, common header blocks, role-specific rules...
These rules load only when touching the tasks/ folder. They don’t load in general conversation.
I piloted it on one project first. Moved the 45-line artifact section to artifact-rules.md and deleted it from CLAUDE.md. It worked. Applied it across the rest.
| Project | CLAUDE.md (before→after) | Reduction | Files split out |
|---|---|---|---|
| A (e-commerce) | 155→96 lines | 38% | artifact-rules.md |
| B (education platform) | 207→140 lines | 32% | artifact-rules.md, edu-rules.md |
| C (game platform) | 214→114 lines | 47% | artifact-rules.md, game-domain.md |
C dropped 47%. Being a game platform, domain rules (46 lines) had been loading every turn. Now they only load when touching game-related files.
But here’s the thing
The numbers aren’t bad. 32–47% reduction. Looks good.
But honestly, this is “32–47% of 155–214 lines.” A few hundred tokens saved per turn. Most of that $19.22 in cache_read isn’t from CLAUDE.md’s 200 lines — it’s from cumulative conversation history. Run 20 turns in a session, and on turn 20, all 19 previous turns load as cache_read.
One project consumed more than half the day’s cost. Over 3x more expensive than the others. The reason was simple: 17 turns in a single session. cache_read spiked in the later turns.
No matter how much you diet CLAUDE.md, if you don’t address how conversation history accumulates, most of cache_read stays the same. EP.1 reduced agent count. EP.2 used hooks to cut output. EP.3 reduced static documents. All meaningful work, but the real leak is still there.
To cut or to continue, that is the question
Past 20 turns, starting a new session is more efficient for cache_read. But a new session means cache_write happens again. CLAUDE.md, MEMORY.md, system prompt — all written from scratch.
Cut sessions to reduce cache_read, and cache_write goes up. Keep sessions to reduce cache_write, and cache_read goes up. Either way, there’s a cost.
I don’t have the answer yet. Around 20 turns feels like the break-even point, but it depends on the type of work. Design-heavy conversations are better cut short. Continuous coding is better kept going.
That’s EP.3’s real discovery. I’ve cut what I can. What’s left is the domain of “how to use it.”
Other posts in this series
- EP.1 — Hit the Limit in Three Hours
- EP.2 — Cutting Costs with Hooks and Subagents
- Building a Discord Alert System
References