Problem

Agent sessions spend tokens on infrastructure loading (reading CLAUDE.md, AGENTS.md, policies, skills, plans) before any productive work begins. Per-message overhead (the interpret-message skill) adds further cost. The total baseline before any work is ~9,600 tokens, and each message adds ~2,100 tokens of overhead.

Token budget by source

SourceTokensWhen loadedWaste type
CLAUDE.md~700Every sessionSetup procedures that run once
AGENTS.md~2,400Every sessionPhilosophy + rules duplicated in policies
Policies (9 files)~1,700Every sessionAll loaded even when irrelevant
interpret-message~2,100Every messageSection 0 runs every message; gates overengineered
Skill registry~2,700On-demandVerbose table format
Plans directory~10,300On-demandCompleted plans, accumulated logs
MEMORY.md~1,000Every sessionDuplicates CLAUDE.md

Reduction strategies

1. Split interpret-message into fast and full paths

The interpret-message skill has 9 steps. For simple commands (“commit this”, “fix the typo”), most steps are waste. Split into:

  • Fast path (~300 tokens): for simple commands — just do the action. No skill coverage check, no plan capture, no text writing.
  • Full path (~2,100 tokens): for substantive messages that need the encoding loop.

The skill already has “When NOT to run the full loop” (lines 166-173) but it loads the full 2,100 tokens before checking.

2. Defer section 0 (improve skills from last turn)

Section 0 of interpret-message runs the skill improvement feedback loop every message. This should run once per session (at the start of the NEXT session) or on explicit request, not every message. Move it to interpret-first-message or a dedicated feedback skill.

3. Compile policies to a single summary

The 9 policy files total ~1,700 tokens. Most sessions only need 2-3 of them. Options:

  • Compile to one-liners: a single file with one sentence per policy (~200 tokens total). Full text available on demand.
  • Lazy load: only load policies relevant to the current task.

4. Trim AGENTS.md

AGENTS.md contains philosophical context (~800 tokens) that doesn’t change session-to-session. Options:

  • Keep only operational rules in AGENTS.md (~600 tokens).
  • Link to specs for philosophical context instead of inlining it.

5. Archive completed plans

The plans directory has ~50 files. Completed and abandoned plans should be moved to plans/archive/ so they don’t appear in the review-plans output. The review-plans skill should only show active and proposed plans by default.

6. Deduplicate MEMORY.md

MEMORY.md duplicates content from CLAUDE.md and AGENTS.md. Strip it to information that is ONLY in memory (learned preferences, session patterns) and not in any other loaded file.

Impact estimate

StrategySavingsWhere
Fast/full interpret-message~1,800 tokens/message (for simple msgs)Per message
Defer section 0~400 tokens/messagePer message
Compile policies~1,200 tokens/sessionSession start
Trim AGENTS.md~1,900 tokens/sessionSession start
Archive plans~5,000 tokens on reviewOn-demand
Deduplicate MEMORY.md~700 tokens/sessionSession start

For a typical 10-message session (6 simple, 4 substantive):

  • Current: ~9,600 + (10 × 2,100) = ~30,600 tokens overhead
  • Optimized: ~5,800 + (6 × 300) + (4 × 1,700) = ~14,400 tokens
  • Reduction: ~53%

What this analysis does NOT cover

  • Token cost of reading source files during actual work (unavoidable).
  • Token cost of writing content (unavoidable).
  • Token cost of MCP tool calls (could be reduced by better caching but is a different problem).