[Saving Tokens While Using Claude] EP.1 — Hit the Limit in Three Hours

A friend asked me to build an e-commerce site for cosmetics. I’d been writing nothing but C at work, so this was a chance to try a new stack. Toss Payments for checkout, React for the frontend. “Every next-gen finance project is going React” — that comment stuck with me, and Toss Payments was Node-friendly. The direction came together quickly.

Obviously, I’d never used this stack before. I wanted to learn it, but the deadline didn’t allow for “study first, build later.” So I gave vibe coding a shot.

“Build me a shopping mall for selling cosmetics. Here’s the reference site URL.” That was all I said, and Claude started hammering away — then out came a mockup that covered things I hadn’t even thought of. It’s been over a year since vibe coding became a thing, and I was completely behind.

My plan to learn the new stack by reading Claude’s code was hopelessly naive. Claude produced code far faster than I could read it. So instead of reviewing it myself, I asked Claude to review its own work. (The end of the human era felt near.)

After using it for a while, Claude started losing track of earlier context. I googled “context” and found out why. I also found that splitting roles into sub-agents was a solution. I set up five: pm, dba, back, front, qa. I defined everything in CLAUDE.md — path conventions, session management guides, the works. Usage was in the mid-teens percent. Seemed like plenty.

Three hours later, I saw You've hit your limit · resets 3am.

The “resets at 3am” part made sense. But it didn’t make sense. Why three hours?

The buttons were wrong from the start

When I set up the five agents, I was working from a flawed hypothesis: “The more Claude thinks, the more tokens it costs.” No evidence. Just felt right.

I packed each agent with if-then rules. If I eliminated Claude’s need to reason, inference costs would drop — or so I believed. I filled CLAUDE.md with that same logic.

Only after hitting the limit did I actually google it. That’s when I learned I was wrong. Most of Claude’s token consumption isn’t reasoning — it’s reading. Every time an agent runs, it reads the entire CLAUDE.md from scratch. The CLAUDE.md I’d so diligently filled was being loaded into context in full, every single time.

I compressed the prose-style explanations into tables and cut duplicate entries. The burn rate dropped.

Splitting further should reduce tokens, right?

After the optimization, the Compacting conversation... messages became less frequent. Context was holding up well. I felt good about it.

But when I looked at actual token consumption, it was leaking even more.

“If I split roles into finer pieces, each agent handles a narrower scope.” That was the next idea.

I went from 5 to 12. api-designer, ui-designer, performance-engineer, security-auditor… A systematic, stage-by-stage structure.

Context held up fine. But tokens felt like they were leaking a bit more.

Merging back, with tears

I asked Claude to diagnose the cause. It felt like receiving a court verdict. token.. my precious…

Number of agents = number of CLAUDE.md reads. Twelve agents loading it twelve times each. The more I split, the faster it burned.

I had to consolidate. I’d put real effort into building them, but I merged them back with tears in my eyes.

api-designer absorbed into backend-developer, ui-designer into frontend-developer, performance-engineer into code-reviewer.

Item	Before	After	Savings
Total agent file size	112KB	40KB	-64%
Number of agents	12	9	-3

The burn rate changed noticeably.

So I went Max

Once the structure was in place, I truly felt how fast Claude works. Yes, it costs money. But the processing speed was beyond imagination. Work that would have taken me days finished in a single session.

I upgraded to the Max plan. Let’s use it wisely, without waste.

The structural problem was solved. I didn’t stop googling to protect my precious. Then I found it. When Claude reads a file, if it’s 500 lines, it reads all 500. When it runs tests, it loads the entire result into context — including passing cases. It wasn’t about the number of agents. It was about what Claude sees.

How could I make Claude see only what it needs?

References

The buttons were wrong from the start#

Splitting further should reduce tokens, right?#

Merging back, with tears#

So I went Max#

The buttons were wrong from the start

Splitting further should reduce tokens, right?

Merging back, with tears

So I went Max