103 lines
4.1 KiB
Markdown
103 lines
4.1 KiB
Markdown
# Systematic Debugging
|
|
|
|
Four-phase debugging framework that ensures root cause investigation before attempting fixes.
|
|
|
|
## The Iron Law
|
|
|
|
```
|
|
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
|
|
```
|
|
|
|
If haven't completed Phase 1, cannot propose fixes.
|
|
|
|
## The Four Phases
|
|
|
|
Must complete each phase before proceeding to next.
|
|
|
|
### Phase 1: Root Cause Investigation
|
|
|
|
**BEFORE attempting ANY fix:**
|
|
|
|
1. **Read Error Messages Carefully** - Don't skip past errors/warnings, read stack traces completely
|
|
2. **Reproduce Consistently** - Can trigger reliably? Exact steps? If not reproducible → gather more data
|
|
3. **Check Recent Changes** - What changed? Git diff, recent commits, new dependencies, config changes
|
|
4. **Gather Evidence in Multi-Component Systems**
|
|
- For EACH component boundary: log data entering/exiting, verify environment propagation
|
|
- Run once to gather evidence showing WHERE it breaks
|
|
- THEN analyze to identify failing component
|
|
5. **Trace Data Flow** - Where does bad value originate? Trace up call stack until finding source (see root-cause-tracing.md)
|
|
|
|
### Phase 2: Pattern Analysis
|
|
|
|
**Find pattern before fixing:**
|
|
|
|
1. **Find Working Examples** - Locate similar working code in same codebase
|
|
2. **Compare Against References** - Read reference implementation COMPLETELY, understand fully before applying
|
|
3. **Identify Differences** - List every difference however small, don't assume "that can't matter"
|
|
4. **Understand Dependencies** - What other components, settings, config, environment needed?
|
|
|
|
### Phase 3: Hypothesis and Testing
|
|
|
|
**Scientific method:**
|
|
|
|
1. **Form Single Hypothesis** - "I think X is root cause because Y", be specific not vague
|
|
2. **Test Minimally** - SMALLEST possible change to test hypothesis, one variable at a time
|
|
3. **Verify Before Continuing** - Worked? → Phase 4. Didn't work? → NEW hypothesis. DON'T add more fixes
|
|
4. **When Don't Know** - Say "I don't understand X", don't pretend, ask for help
|
|
|
|
### Phase 4: Implementation
|
|
|
|
**Fix root cause, not symptom:**
|
|
|
|
1. **Create Failing Test Case** - Simplest reproduction, automated if possible, MUST have before fixing
|
|
2. **Implement Single Fix** - Address root cause identified, ONE change, no "while I'm here" improvements
|
|
3. **Verify Fix** - Test passes? No other tests broken? Issue actually resolved?
|
|
4. **If Fix Doesn't Work**
|
|
- STOP. Count: How many fixes tried?
|
|
- If < 3: Return to Phase 1, re-analyze with new information
|
|
- **If ≥ 3: STOP and question architecture**
|
|
5. **If 3+ Fixes Failed: Question Architecture**
|
|
- Pattern: Each fix reveals new shared state/coupling problem elsewhere
|
|
- STOP and question fundamentals: Is pattern sound? Wrong architecture?
|
|
- Discuss with human partner before more fixes
|
|
|
|
## Red Flags - STOP and Follow Process
|
|
|
|
If catch yourself thinking:
|
|
- "Quick fix for now, investigate later"
|
|
- "Just try changing X and see if it works"
|
|
- "Add multiple changes, run tests"
|
|
- "Skip the test, I'll manually verify"
|
|
- "It's probably X, let me fix that"
|
|
- "I don't fully understand but this might work"
|
|
- "One more fix attempt" (when already tried 2+)
|
|
|
|
**ALL mean:** STOP. Return to Phase 1.
|
|
|
|
## Human Partner Signals You're Doing It Wrong
|
|
|
|
- "Is that not happening?" - Assumed without verifying
|
|
- "Will it show us...?" - Should have added evidence gathering
|
|
- "Stop guessing" - Proposing fixes without understanding
|
|
- "Ultrathink this" - Question fundamentals, not just symptoms
|
|
- "We're stuck?" (frustrated) - Approach isn't working
|
|
|
|
**When see these:** STOP. Return to Phase 1.
|
|
|
|
## Common Rationalizations
|
|
|
|
| Excuse | Reality |
|
|
|--------|---------|
|
|
| "Issue is simple, don't need process" | Simple issues have root causes too |
|
|
| "Emergency, no time for process" | Systematic is FASTER than guess-and-check |
|
|
| "Just try this first, then investigate" | First fix sets pattern. Do right from start |
|
|
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem |
|
|
|
|
## Real-World Impact
|
|
|
|
From debugging sessions:
|
|
- Systematic approach: 15-30 minutes to fix
|
|
- Random fixes approach: 2-3 hours of thrashing
|
|
- First-time fix rate: 95% vs 40%
|
|
- New bugs introduced: Near zero vs common
|