Context Fundamentals

Context = all input provided to LLM for task completion.

Anatomy of Context

Component	Purpose	Token Impact
System Prompt	Identity, constraints, guidelines	Stable, cacheable
Tool Definitions	Action specs with params/returns	Grows with capabilities
Retrieved Docs	Domain knowledge, just-in-time	Variable, selective
Message History	Conversation state, task progress	Accumulates over time
Tool Outputs	Results from actions	83.9% of typical context

Attention Mechanics

U-shaped curve: Beginning/end get more attention than middle
Attention budget: n^2 relationships for n tokens depletes with growth
Position encoding: Interpolation allows longer sequences with degradation
First-token sink: BOS token absorbs large attention budget

System Prompt Structure

<BACKGROUND_INFORMATION>Domain knowledge, role definition</BACKGROUND_INFORMATION>
<INSTRUCTIONS>Step-by-step procedures</INSTRUCTIONS>
<TOOL_GUIDANCE>When/how to use tools</TOOL_GUIDANCE>
<OUTPUT_DESCRIPTION>Format requirements</OUTPUT_DESCRIPTION>

Progressive Disclosure Levels

Metadata (~100 words) - Always in context
SKILL.md body (<5k words) - When skill triggers
Bundled resources (Unlimited) - As needed

Token Budget Allocation

Component	Typical Range	Notes
System Prompt	500-2000	Stable, optimize once
Tool Definitions	100-500 per tool	Keep under 20 tools
Retrieved Docs	1000-5000	Selective loading
Message History	Variable	Summarize at 70%
Reserved Buffer	10-20%	For responses

Document Management

Strong identifiers: customer_pricing_rates.json not data/file1.json Chunk at semantic boundaries: Paragraphs, sections, not arbitrary lengths Include metadata: Source, date, relevance score

Message History Pattern

# Summary injection every 20 messages
if len(messages) % 20 == 0:
    summary = summarize_conversation(messages[-20:])
    messages.append({"role": "system", "content": f"Summary: {summary}"})

Guidelines

Treat context as finite with diminishing returns
Place critical info at attention-favored positions
Use file-system-based access for large documents
Pre-load stable content, just-in-time load dynamic
Design with explicit token budgets
Monitor usage, implement compaction triggers at 70-80%

Context Degradation - Failure patterns
Context Optimization - Efficiency techniques
Memory Systems - External storage

2.7 KiB Raw Blame History