Files
english/.opencode/skills/ck-debug/references/systematic-debugging.md
2026-04-12 01:06:31 +07:00

4.1 KiB

Systematic Debugging

Four-phase debugging framework that ensures root cause investigation before attempting fixes.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If haven't completed Phase 1, cannot propose fixes.

The Four Phases

Must complete each phase before proceeding to next.

Phase 1: Root Cause Investigation

BEFORE attempting ANY fix:

  1. Read Error Messages Carefully - Don't skip past errors/warnings, read stack traces completely
  2. Reproduce Consistently - Can trigger reliably? Exact steps? If not reproducible → gather more data
  3. Check Recent Changes - What changed? Git diff, recent commits, new dependencies, config changes
  4. Gather Evidence in Multi-Component Systems
    • For EACH component boundary: log data entering/exiting, verify environment propagation
    • Run once to gather evidence showing WHERE it breaks
    • THEN analyze to identify failing component
  5. Trace Data Flow - Where does bad value originate? Trace up call stack until finding source (see root-cause-tracing.md)

Phase 2: Pattern Analysis

Find pattern before fixing:

  1. Find Working Examples - Locate similar working code in same codebase
  2. Compare Against References - Read reference implementation COMPLETELY, understand fully before applying
  3. Identify Differences - List every difference however small, don't assume "that can't matter"
  4. Understand Dependencies - What other components, settings, config, environment needed?

Phase 3: Hypothesis and Testing

Scientific method:

  1. Form Single Hypothesis - "I think X is root cause because Y", be specific not vague
  2. Test Minimally - SMALLEST possible change to test hypothesis, one variable at a time
  3. Verify Before Continuing - Worked? → Phase 4. Didn't work? → NEW hypothesis. DON'T add more fixes
  4. When Don't Know - Say "I don't understand X", don't pretend, ask for help

Phase 4: Implementation

Fix root cause, not symptom:

  1. Create Failing Test Case - Simplest reproduction, automated if possible, MUST have before fixing
  2. Implement Single Fix - Address root cause identified, ONE change, no "while I'm here" improvements
  3. Verify Fix - Test passes? No other tests broken? Issue actually resolved?
  4. If Fix Doesn't Work
    • STOP. Count: How many fixes tried?
    • If < 3: Return to Phase 1, re-analyze with new information
    • If ≥ 3: STOP and question architecture
  5. If 3+ Fixes Failed: Question Architecture
    • Pattern: Each fix reveals new shared state/coupling problem elsewhere
    • STOP and question fundamentals: Is pattern sound? Wrong architecture?
    • Discuss with human partner before more fixes

Red Flags - STOP and Follow Process

If catch yourself thinking:

  • "Quick fix for now, investigate later"
  • "Just try changing X and see if it works"
  • "Add multiple changes, run tests"
  • "Skip the test, I'll manually verify"
  • "It's probably X, let me fix that"
  • "I don't fully understand but this might work"
  • "One more fix attempt" (when already tried 2+)

ALL mean: STOP. Return to Phase 1.

Human Partner Signals You're Doing It Wrong

  • "Is that not happening?" - Assumed without verifying
  • "Will it show us...?" - Should have added evidence gathering
  • "Stop guessing" - Proposing fixes without understanding
  • "Ultrathink this" - Question fundamentals, not just symptoms
  • "We're stuck?" (frustrated) - Approach isn't working

When see these: STOP. Return to Phase 1.

Common Rationalizations

Excuse Reality
"Issue is simple, don't need process" Simple issues have root causes too
"Emergency, no time for process" Systematic is FASTER than guess-and-check
"Just try this first, then investigate" First fix sets pattern. Do right from start
"One more fix attempt" (after 2+ failures) 3+ failures = architectural problem

Real-World Impact

From debugging sessions:

  • Systematic approach: 15-30 minutes to fix
  • Random fixes approach: 2-3 hours of thrashing
  • First-time fix rate: 95% vs 40%
  • New bugs introduced: Near zero vs common