# Systematic Debugging Four-phase debugging framework that ensures root cause investigation before attempting fixes. ## The Iron Law ``` NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST ``` If haven't completed Phase 1, cannot propose fixes. ## The Four Phases Must complete each phase before proceeding to next. ### Phase 1: Root Cause Investigation **BEFORE attempting ANY fix:** 1. **Read Error Messages Carefully** - Don't skip past errors/warnings, read stack traces completely 2. **Reproduce Consistently** - Can trigger reliably? Exact steps? If not reproducible → gather more data 3. **Check Recent Changes** - What changed? Git diff, recent commits, new dependencies, config changes 4. **Gather Evidence in Multi-Component Systems** - For EACH component boundary: log data entering/exiting, verify environment propagation - Run once to gather evidence showing WHERE it breaks - THEN analyze to identify failing component 5. **Trace Data Flow** - Where does bad value originate? Trace up call stack until finding source (see root-cause-tracing.md) ### Phase 2: Pattern Analysis **Find pattern before fixing:** 1. **Find Working Examples** - Locate similar working code in same codebase 2. **Compare Against References** - Read reference implementation COMPLETELY, understand fully before applying 3. **Identify Differences** - List every difference however small, don't assume "that can't matter" 4. **Understand Dependencies** - What other components, settings, config, environment needed? ### Phase 3: Hypothesis and Testing **Scientific method:** 1. **Form Single Hypothesis** - "I think X is root cause because Y", be specific not vague 2. **Test Minimally** - SMALLEST possible change to test hypothesis, one variable at a time 3. **Verify Before Continuing** - Worked? → Phase 4. Didn't work? → NEW hypothesis. DON'T add more fixes 4. **When Don't Know** - Say "I don't understand X", don't pretend, ask for help ### Phase 4: Implementation **Fix root cause, not symptom:** 1. **Create Failing Test Case** - Simplest reproduction, automated if possible, MUST have before fixing 2. **Implement Single Fix** - Address root cause identified, ONE change, no "while I'm here" improvements 3. **Verify Fix** - Test passes? No other tests broken? Issue actually resolved? 4. **If Fix Doesn't Work** - STOP. Count: How many fixes tried? - If < 3: Return to Phase 1, re-analyze with new information - **If ≥ 3: STOP and question architecture** 5. **If 3+ Fixes Failed: Question Architecture** - Pattern: Each fix reveals new shared state/coupling problem elsewhere - STOP and question fundamentals: Is pattern sound? Wrong architecture? - Discuss with human partner before more fixes ## Red Flags - STOP and Follow Process If catch yourself thinking: - "Quick fix for now, investigate later" - "Just try changing X and see if it works" - "Add multiple changes, run tests" - "Skip the test, I'll manually verify" - "It's probably X, let me fix that" - "I don't fully understand but this might work" - "One more fix attempt" (when already tried 2+) **ALL mean:** STOP. Return to Phase 1. ## Human Partner Signals You're Doing It Wrong - "Is that not happening?" - Assumed without verifying - "Will it show us...?" - Should have added evidence gathering - "Stop guessing" - Proposing fixes without understanding - "Ultrathink this" - Question fundamentals, not just symptoms - "We're stuck?" (frustrated) - Approach isn't working **When see these:** STOP. Return to Phase 1. ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Issue is simple, don't need process" | Simple issues have root causes too | | "Emergency, no time for process" | Systematic is FASTER than guess-and-check | | "Just try this first, then investigate" | First fix sets pattern. Do right from start | | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem | ## Real-World Impact From debugging sessions: - Systematic approach: 15-30 minutes to fix - Random fixes approach: 2-3 hours of thrashing - First-time fix rate: 95% vs 40% - New bugs introduced: Near zero vs common