Problem
Agent sessions spend tokens on infrastructure loading (reading CLAUDE.md, AGENTS.md, policies, skills, plans) before any productive work begins. Per-message overhead (the interpret-message skill) adds further cost. The total baseline before any work is ~9,600 tokens, and each message adds ~2,100 tokens of overhead.
Token budget by source
| Source | Tokens | When loaded | Waste type |
|---|---|---|---|
| CLAUDE.md | ~700 | Every session | Setup procedures that run once |
| AGENTS.md | ~2,400 | Every session | Philosophy + rules duplicated in policies |
| Policies (9 files) | ~1,700 | Every session | All loaded even when irrelevant |
| interpret-message | ~2,100 | Every message | Section 0 runs every message; gates overengineered |
| Skill registry | ~2,700 | On-demand | Verbose table format |
| Plans directory | ~10,300 | On-demand | Completed plans, accumulated logs |
| MEMORY.md | ~1,000 | Every session | Duplicates CLAUDE.md |
Reduction strategies
1. Split interpret-message into fast and full paths
The interpret-message skill has 9 steps. For simple commands (“commit this”, “fix the typo”), most steps are waste. Split into:
- Fast path (~300 tokens): for simple commands — just do the action. No skill coverage check, no plan capture, no text writing.
- Full path (~2,100 tokens): for substantive messages that need the encoding loop.
The skill already has “When NOT to run the full loop” (lines 166-173) but it loads the full 2,100 tokens before checking.
2. Defer section 0 (improve skills from last turn)
Section 0 of interpret-message runs the skill improvement feedback loop every message. This should run once per session (at the start of the NEXT session) or on explicit request, not every message. Move it to interpret-first-message or a dedicated feedback skill.
3. Compile policies to a single summary
The 9 policy files total ~1,700 tokens. Most sessions only need 2-3 of them. Options:
- Compile to one-liners: a single file with one sentence per policy (~200 tokens total). Full text available on demand.
- Lazy load: only load policies relevant to the current task.
4. Trim AGENTS.md
AGENTS.md contains philosophical context (~800 tokens) that doesn’t change session-to-session. Options:
- Keep only operational rules in AGENTS.md (~600 tokens).
- Link to specs for philosophical context instead of inlining it.
5. Archive completed plans
The plans directory has ~50 files. Completed and abandoned plans
should be moved to plans/archive/ so they don’t appear in the
review-plans output. The review-plans skill should only show active
and proposed plans by default.
6. Deduplicate MEMORY.md
MEMORY.md duplicates content from CLAUDE.md and AGENTS.md. Strip it to information that is ONLY in memory (learned preferences, session patterns) and not in any other loaded file.
Impact estimate
| Strategy | Savings | Where |
|---|---|---|
| Fast/full interpret-message | ~1,800 tokens/message (for simple msgs) | Per message |
| Defer section 0 | ~400 tokens/message | Per message |
| Compile policies | ~1,200 tokens/session | Session start |
| Trim AGENTS.md | ~1,900 tokens/session | Session start |
| Archive plans | ~5,000 tokens on review | On-demand |
| Deduplicate MEMORY.md | ~700 tokens/session | Session start |
For a typical 10-message session (6 simple, 4 substantive):
- Current: ~9,600 + (10 × 2,100) = ~30,600 tokens overhead
- Optimized: ~5,800 + (6 × 300) + (4 × 1,700) = ~14,400 tokens
- Reduction: ~53%
What this analysis does NOT cover
- Token cost of reading source files during actual work (unavoidable).
- Token cost of writing content (unavoidable).
- Token cost of MCP tool calls (could be reduced by better caching but is a different problem).