Files
english/.opencode/skills/ck-debug/references/root-cause-tracing.md
2026-04-12 01:06:31 +07:00

3.3 KiB

Root Cause Tracing

Systematically trace bugs backward through call stack to find original trigger.

Core Principle

Trace backward through call chain until finding original trigger, then fix at source.

Bugs often manifest deep in call stack (git init in wrong directory, file created in wrong location). Instinct is to fix where error appears, but that's treating symptom.

When to Use

Use when:

  • Error happens deep in execution (not at entry point)
  • Stack trace shows long call chain
  • Unclear where invalid data originated
  • Need to find which test/code triggers problem

The Tracing Process

1. Observe the Symptom

Error: git init failed in /Users/jesse/project/packages/core

2. Find Immediate Cause

What code directly causes this?

await execFileAsync('git', ['init'], { cwd: projectDir });

3. Ask: What Called This?

WorktreeManager.createSessionWorktree(projectDir, sessionId)
   called by Session.initializeWorkspace()
   called by Session.create()
   called by test at Project.create()

4. Keep Tracing Up

What value was passed?

  • projectDir = '' (empty string!)
  • Empty string as cwd resolves to process.cwd()
  • That's the source code directory!

5. Find Original Trigger

Where did empty string come from?

const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!

Adding Stack Traces

When can't trace manually, add instrumentation:

async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}

Critical: Use console.error() in tests (not logger - may not show)

Run and capture:

npm test 2>&1 | grep 'DEBUG git init'

Analyze stack traces:

  • Look for test file names
  • Find line number triggering call
  • Identify pattern (same test? same parameter?)

Finding Which Test Causes Pollution

If something appears during tests but don't know which test:

Use bisection script: scripts/find-polluter.sh

./scripts/find-polluter.sh '.git' 'src/**/*.test.ts'

Runs tests one-by-one, stops at first polluter.

Key Principle

NEVER fix just where error appears. Trace back to find original trigger.

When found immediate cause:

  • Can trace one level up? → Trace backwards
  • Is this the source? → Fix at source
  • Then add validation at each layer (see defense-in-depth.md)

Real Example

Symptom: .git created in packages/core/ (source code)

Trace chain:

  1. git init runs in process.cwd() ← empty cwd parameter
  2. WorktreeManager called with empty projectDir
  3. Session.create() passed empty string
  4. Test accessed context.tempDir before beforeEach
  5. setupCoreTest() returns { tempDir: '' } initially

Root cause: Top-level variable initialization accessing empty value

Fix: Made tempDir a getter that throws if accessed before beforeEach

Also added defense-in-depth:

  • Layer 1: Project.create() validates directory
  • Layer 2: WorkspaceManager validates not empty
  • Layer 3: NODE_ENV guard refuses git init outside tmpdir
  • Layer 4: Stack trace logging before git init