2.3 KiB
2.3 KiB
Context Optimization
Extend effective context capacity through strategic techniques.
Four Core Strategies
| Strategy | Target | Reduction | When to Use |
|---|---|---|---|
| Compaction | Full context | 50-70% | Approaching limits |
| Observation Masking | Tool outputs | 60-80% | Verbose outputs >80% |
| KV-Cache Optimization | Repeated prefixes | 70%+ hit | Stable prompts |
| Context Partitioning | Work distribution | N/A | Parallelizable tasks |
Compaction
Summarize context when approaching limits.
Priority: Tool outputs → Old turns → Retrieved docs → Never: System prompt
if context_tokens / context_limit > 0.8:
context = compact_context(context)
Preserve: Key findings, decisions, commitments (remove supporting details)
Observation Masking
Replace verbose tool outputs with compact references.
if len(observation) > max_length:
ref_id = store_observation(observation)
return f"[Obs:{ref_id}. Key: {extract_key(observation)}]"
Never mask: Current task critical, most recent turn, active reasoning Always mask: Repeated outputs, boilerplate, already summarized
KV-Cache Optimization
Reuse cached Key/Value tensors for identical prefixes.
# Cache-friendly ordering (stable first)
context = [system_prompt, tool_definitions] # Cacheable
context += [unique_content] # Variable last
Tips: Avoid timestamps in stable sections, consistent formatting, stable structure
Context Partitioning
Split work across sub-agents with isolated contexts.
result = await sub_agent.process(subtask, clean_context=True)
coordinator.receive(result.summary) # Only essentials
Decision Framework
| Dominant Component | Apply |
|---|---|
| Tool outputs | Observation masking |
| Retrieved docs | Summarization or partitioning |
| Message history | Compaction + summarization |
| Multiple | Combine strategies |
Guidelines
- Measure before optimizing
- Apply compaction before masking
- Design for cache stability
- Partition before context problematic
- Monitor effectiveness over time
- Balance savings vs quality