2.7 KiB
2.7 KiB
Context Fundamentals
Context = all input provided to LLM for task completion.
Anatomy of Context
| Component | Purpose | Token Impact |
|---|---|---|
| System Prompt | Identity, constraints, guidelines | Stable, cacheable |
| Tool Definitions | Action specs with params/returns | Grows with capabilities |
| Retrieved Docs | Domain knowledge, just-in-time | Variable, selective |
| Message History | Conversation state, task progress | Accumulates over time |
| Tool Outputs | Results from actions | 83.9% of typical context |
Attention Mechanics
- U-shaped curve: Beginning/end get more attention than middle
- Attention budget: n^2 relationships for n tokens depletes with growth
- Position encoding: Interpolation allows longer sequences with degradation
- First-token sink: BOS token absorbs large attention budget
System Prompt Structure
<BACKGROUND_INFORMATION>Domain knowledge, role definition</BACKGROUND_INFORMATION>
<INSTRUCTIONS>Step-by-step procedures</INSTRUCTIONS>
<TOOL_GUIDANCE>When/how to use tools</TOOL_GUIDANCE>
<OUTPUT_DESCRIPTION>Format requirements</OUTPUT_DESCRIPTION>
Progressive Disclosure Levels
- Metadata (~100 words) - Always in context
- SKILL.md body (<5k words) - When skill triggers
- Bundled resources (Unlimited) - As needed
Token Budget Allocation
| Component | Typical Range | Notes |
|---|---|---|
| System Prompt | 500-2000 | Stable, optimize once |
| Tool Definitions | 100-500 per tool | Keep under 20 tools |
| Retrieved Docs | 1000-5000 | Selective loading |
| Message History | Variable | Summarize at 70% |
| Reserved Buffer | 10-20% | For responses |
Document Management
Strong identifiers: customer_pricing_rates.json not data/file1.json
Chunk at semantic boundaries: Paragraphs, sections, not arbitrary lengths
Include metadata: Source, date, relevance score
Message History Pattern
# Summary injection every 20 messages
if len(messages) % 20 == 0:
summary = summarize_conversation(messages[-20:])
messages.append({"role": "system", "content": f"Summary: {summary}"})
Guidelines
- Treat context as finite with diminishing returns
- Place critical info at attention-favored positions
- Use file-system-based access for large documents
- Pre-load stable content, just-in-time load dynamic
- Design with explicit token budgets
- Monitor usage, implement compaction triggers at 70-80%
Related Topics
- Context Degradation - Failure patterns
- Context Optimization - Efficiency techniques
- Memory Systems - External storage