Files
2026-04-12 01:06:31 +07:00

18 KiB

Backend Debugging Strategies

Comprehensive debugging techniques, tools, and best practices for backend systems (2025).

Debugging Mindset

The Scientific Method for Debugging

  1. Observe - Gather symptoms and data
  2. Hypothesize - Form theories about the cause
  3. Test - Verify or disprove theories
  4. Iterate - Refine understanding
  5. Fix - Apply solution
  6. Verify - Confirm fix works

Golden Rules

  1. Reproduce first - Debugging without reproduction is guessing
  2. Simplify the problem - Isolate variables
  3. Read the logs - Error messages contain clues
  4. Check assumptions - "It should work" isn't debugging
  5. Use scientific method - Avoid random changes
  6. Document findings - Future you will thank you

Logging Best Practices

Structured Logging

Node.js (Pino - Fastest)

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  transport: {
    target: 'pino-pretty',
    options: { colorize: true }
  }
});

// Structured logging with context
logger.info({ userId: '123', action: 'login' }, 'User logged in');

// Error logging with stack trace
try {
  await riskyOperation();
} catch (error) {
  logger.error({ err: error, userId: '123' }, 'Operation failed');
}

Python (Structlog)

import structlog

logger = structlog.get_logger()

# Structured context
logger.info("user_login", user_id="123", ip="192.168.1.1")

# Error with exception
try:
    risky_operation()
except Exception as e:
    logger.error("operation_failed", user_id="123", exc_info=True)

Go (Zap - High Performance)

import "go.uber.org/zap"

logger, _ := zap.NewProduction()
defer logger.Sync()

// Structured fields
logger.Info("user logged in",
    zap.String("user_id", "123"),
    zap.String("ip", "192.168.1.1"),
)

// Error logging
if err := riskyOperation(); err != nil {
    logger.Error("operation failed",
        zap.Error(err),
        zap.String("user_id", "123"),
    )
}

Log Levels

Level Purpose Example
TRACE Very detailed, dev only Request/response bodies
DEBUG Detailed info for debugging SQL queries, cache hits
INFO General informational User login, API calls
WARN Potential issues Deprecated API usage
ERROR Error conditions Failed API calls, exceptions
FATAL Critical failures Database connection lost

What to Log

DO LOG:

  • Request/response metadata (not bodies in prod)
  • Error messages with context
  • Performance metrics (duration, size)
  • Security events (login, permission changes)
  • Business events (orders, payments)

DON'T LOG:

  • Passwords or secrets
  • Credit card numbers
  • Personal identifiable information (PII)
  • Session tokens
  • Full request bodies in production

Debugging Tools by Language

Node.js / TypeScript

1. Chrome DevTools (Built-in)

# Run with inspect flag
node --inspect-brk app.js

# Open chrome://inspect in Chrome
# Set breakpoints, step through code

2. VS Code Debugger

// .vscode/launch.json
{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "node",
      "request": "launch",
      "name": "Debug Server",
      "skipFiles": ["<node_internals>/**"],
      "program": "${workspaceFolder}/src/index.ts",
      "preLaunchTask": "npm: build",
      "outFiles": ["${workspaceFolder}/dist/**/*.js"]
    }
  ]
}

3. Debug Module

import debug from 'debug';

const log = debug('app:server');
const error = debug('app:error');

log('Starting server on port %d', 3000);
error('Failed to connect to database');

// Run with: DEBUG=app:* node app.js

Python

1. PDB (Built-in Debugger)

import pdb

def problematic_function(data):
    # Set breakpoint
    pdb.set_trace()

    # Debugger commands:
    # l - list code
    # n - next line
    # s - step into
    # c - continue
    # p variable - print variable
    # q - quit
    result = process(data)
    return result

2. IPython Debugger (Better)

from IPython import embed

def problematic_function(data):
    # Drop into IPython shell
    embed()

    result = process(data)
    return result

3. VS Code Debugger

// .vscode/launch.json
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: FastAPI",
      "type": "python",
      "request": "launch",
      "module": "uvicorn",
      "args": ["main:app", "--reload"],
      "jinja": true
    }
  ]
}

Go

1. Delve (Standard Debugger)

# Install
go install github.com/go-delve/delve/cmd/dlv@latest

# Debug
dlv debug main.go

# Commands:
# b main.main - set breakpoint
# c - continue
# n - next line
# s - step into
# p variable - print variable
# q - quit

2. VS Code Debugger

// .vscode/launch.json
{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Launch Package",
      "type": "go",
      "request": "launch",
      "mode": "debug",
      "program": "${workspaceFolder}"
    }
  ]
}

Rust

1. LLDB/GDB (Native Debuggers)

# Build with debug info
cargo build

# Debug with LLDB
rust-lldb ./target/debug/myapp

# Debug with GDB
rust-gdb ./target/debug/myapp

2. VS Code Debugger (CodeLLDB)

// .vscode/launch.json
{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "lldb",
      "request": "launch",
      "name": "Debug",
      "program": "${workspaceFolder}/target/debug/myapp",
      "args": [],
      "cwd": "${workspaceFolder}"
    }
  ]
}

Database Debugging

SQL Query Debugging (PostgreSQL)

1. EXPLAIN ANALYZE

-- Show query execution plan and actual timings
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

-- Look for:
-- - Seq Scan on large tables (missing indexes)
-- - High execution time
-- - Large row estimates

2. Enable Slow Query Logging

-- PostgreSQL configuration
ALTER DATABASE mydb SET log_min_duration_statement = 1000; -- Log queries >1s

-- Check slow queries
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

3. Active Query Monitoring

-- See currently running queries
SELECT pid, now() - query_start as duration, query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

-- Kill a long-running query
SELECT pg_terminate_backend(pid);

MongoDB Debugging

1. Explain Query Performance

db.users.find({ email: 'test@example.com' }).explain('executionStats')

// Look for:
// - totalDocsExamined vs nReturned (should be close)
// - COLLSCAN (collection scan - needs index)
// - executionTimeMillis (should be low)

2. Profile Slow Queries

// Enable profiling for queries >100ms
db.setProfilingLevel(1, { slowms: 100 })

// View slow queries
db.system.profile.find().limit(5).sort({ ts: -1 }).pretty()

// Disable profiling
db.setProfilingLevel(0)

Redis Debugging

1. Monitor Commands

# See all commands in real-time
redis-cli MONITOR

# Check slow log
redis-cli SLOWLOG GET 10

# Set slow log threshold (microseconds)
redis-cli CONFIG SET slowlog-log-slower-than 10000

2. Memory Analysis

# Memory usage by key pattern
redis-cli --bigkeys

# Memory usage details
redis-cli INFO memory

# Analyze specific key
redis-cli MEMORY USAGE mykey

API Debugging

HTTP Request Debugging

1. cURL Testing

# Verbose output with headers
curl -v https://api.example.com/users

# Include response headers
curl -i https://api.example.com/users

# POST with JSON
curl -X POST https://api.example.com/users \
  -H "Content-Type: application/json" \
  -d '{"name":"John","email":"john@example.com"}' \
  -v

# Save response to file
curl https://api.example.com/users -o response.json

2. HTTPie (User-Friendly)

# Install
pip install httpie

# Simple GET
http GET https://api.example.com/users

# POST with JSON
http POST https://api.example.com/users name=John email=john@example.com

# Custom headers
http GET https://api.example.com/users Authorization:"Bearer token123"

3. Request Logging Middleware

Express/Node.js:

import morgan from 'morgan';

// Development
app.use(morgan('dev'));

// Production (JSON format)
app.use(morgan('combined'));

// Custom format
app.use(morgan(':method :url :status :response-time ms - :res[content-length]'));

FastAPI/Python:

from fastapi import Request
import time

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start_time = time.time()
    response = await call_next(request)
    duration = time.time() - start_time

    logger.info(
        "request_processed",
        method=request.method,
        path=request.url.path,
        status_code=response.status_code,
        duration_ms=duration * 1000
    )
    return response

Performance Debugging

CPU Profiling

Node.js (0x)

# Install
npm install -g 0x

# Profile application
0x node app.js

# Open flamegraph in browser
# Identify hot spots (red areas)

Node.js (Clinic.js)

# Install
npm install -g clinic

# CPU profiling
clinic doctor -- node app.js

# Heap profiling
clinic heapprofiler -- node app.js

# Event loop analysis
clinic bubbleprof -- node app.js

Python (cProfile)

import cProfile
import pstats

# Profile function
profiler = cProfile.Profile()
profiler.enable()

# Your code
result = expensive_operation()

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)  # Top 10 functions

Go (pprof)

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    // Enable profiling endpoint
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // Your application
    startServer()
}

// Profile CPU
// go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

// Profile heap
// go tool pprof http://localhost:6060/debug/pprof/heap

Memory Debugging

Node.js (Heap Snapshots)

// Take heap snapshot programmatically
import { writeHeapSnapshot } from 'v8';

app.get('/debug/heap', (req, res) => {
    const filename = writeHeapSnapshot();
    res.send(`Heap snapshot written to ${filename}`);
});

// Analyze in Chrome DevTools
// 1. Load heap snapshot
// 2. Compare snapshots to find memory leaks
// 3. Look for detached DOM nodes, large arrays

Python (Memory Profiler)

from memory_profiler import profile

@profile
def memory_intensive_function():
    large_list = [i for i in range(1000000)]
    return sum(large_list)

# Run with: python -m memory_profiler script.py
# Shows line-by-line memory usage

Production Debugging

Application Performance Monitoring (APM)

New Relic

// newrelic.js
export const config = {
  app_name: ['My Backend API'],
  license_key: process.env.NEW_RELIC_LICENSE_KEY,
  logging: { level: 'info' },
  distributed_tracing: { enabled: true },
};

// Import at app entry
import 'newrelic';

DataDog

import tracer from 'dd-trace';

tracer.init({
  service: 'backend-api',
  env: process.env.NODE_ENV,
  version: '1.0.0',
  logInjection: true
});

Sentry (Error Tracking)

import * as Sentry from '@sentry/node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 1.0,
});

// Capture errors
try {
  await riskyOperation();
} catch (error) {
  Sentry.captureException(error, {
    user: { id: userId },
    tags: { operation: 'payment' },
  });
}

Distributed Tracing

OpenTelemetry (Vendor-Agnostic)

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';

const sdk = new NodeSDK({
  traceExporter: new JaegerExporter({
    endpoint: 'http://localhost:14268/api/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

// Traces HTTP, database, Redis automatically

Log Aggregation

ELK Stack (Elasticsearch, Logstash, Kibana)

# docker-compose.yml
version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
    ports:
      - 9200:9200

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.0
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    ports:
      - 5601:5601

Loki + Grafana (Lightweight)

# promtail config for log shipping
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: backend-api
          __path__: /var/log/app/*.log

Common Debugging Scenarios

1. High CPU Usage

Steps:

  1. Profile CPU (flamegraph)
  2. Identify hot functions
  3. Check for:
    • Infinite loops
    • Heavy regex operations
    • Inefficient algorithms (O(n²))
    • Blocking operations in event loop (Node.js)

Node.js Example:

// ❌ Bad: Blocking event loop
function fibonacci(n) {
  if (n <= 1) return n;
  return fibonacci(n - 1) + fibonacci(n - 2); // Exponential time
}

// ✅ Good: Memoized or iterative
const memo = new Map();
function fibonacciMemo(n) {
  if (n <= 1) return n;
  if (memo.has(n)) return memo.get(n);
  const result = fibonacciMemo(n - 1) + fibonacciMemo(n - 2);
  memo.set(n, result);
  return result;
}

2. Memory Leaks

Symptoms:

  • Memory usage grows over time
  • Eventually crashes (OOM)
  • Performance degradation

Common Causes:

// ❌ Memory leak: Event listeners not removed
class DataService {
  constructor(eventBus) {
    eventBus.on('data', (data) => this.processData(data));
    // Listener never removed, holds reference to DataService
  }
}

// ✅ Fix: Remove listeners
class DataService {
  constructor(eventBus) {
    this.eventBus = eventBus;
    this.handler = (data) => this.processData(data);
    eventBus.on('data', this.handler);
  }

  destroy() {
    this.eventBus.off('data', this.handler);
  }
}

// ❌ Memory leak: Global cache without limits
const cache = new Map();
function getCachedData(key) {
  if (!cache.has(key)) {
    cache.set(key, expensiveOperation(key)); // Grows forever
  }
  return cache.get(key);
}

// ✅ Fix: LRU cache with size limit
import LRU from 'lru-cache';
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 60 });

Detection:

# Node.js: Check heap size over time
node --expose-gc --max-old-space-size=4096 app.js

# Take periodic heap snapshots
# Compare snapshots in Chrome DevTools

3. Slow Database Queries

Steps:

  1. Enable slow query log
  2. Analyze with EXPLAIN
  3. Add indexes
  4. Optimize query

PostgreSQL Example:

-- Before: Slow full table scan
SELECT * FROM orders
WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 10;

-- EXPLAIN shows: Seq Scan on orders

-- Fix: Add index
CREATE INDEX idx_orders_user_id_created_at
ON orders(user_id, created_at DESC);

-- After: Index Scan using idx_orders_user_id_created_at
-- 100x faster

4. Connection Pool Exhaustion

Symptoms:

  • "Connection pool exhausted" errors
  • Requests hang indefinitely
  • Database connections at max

Causes & Fixes:

// ❌ Bad: Connection leak
async function getUser(id) {
  const client = await pool.connect();
  const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
  return result.rows[0];
  // Connection never released!
}

// ✅ Good: Always release
async function getUser(id) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
    return result.rows[0];
  } finally {
    client.release(); // Always release
  }
}

// ✅ Better: Use pool directly
async function getUser(id) {
  const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]);
  return result.rows[0];
  // Automatically releases
}

5. Race Conditions

Example:

// ❌ Bad: Race condition
let counter = 0;

async function incrementCounter() {
  const current = counter; // Thread 1 reads 0
  await doSomethingAsync(); // Thread 2 reads 0
  counter = current + 1; // Thread 1 writes 1, Thread 2 writes 1
  // Expected: 2, Actual: 1
}

// ✅ Fix: Atomic operations (Redis)
async function incrementCounter() {
  return await redis.incr('counter');
  // Atomic, thread-safe
}

// ✅ Fix: Database transactions
async function incrementCounter(userId) {
  await db.transaction(async (trx) => {
    const user = await trx('users')
      .where({ id: userId })
      .forUpdate() // Row-level lock
      .first();

    await trx('users')
      .where({ id: userId })
      .update({ counter: user.counter + 1 });
  });
}

Debugging Checklist

Before Diving Into Code:

  • Read error message completely
  • Check logs for context
  • Reproduce the issue reliably
  • Isolate the problem (binary search)
  • Verify assumptions

Investigation:

  • Enable debug logging
  • Add strategic log points
  • Use debugger breakpoints
  • Profile performance if slow
  • Check database queries
  • Monitor system resources

Production Issues:

  • Check APM dashboards
  • Review distributed traces
  • Analyze error rates
  • Compare with previous baseline
  • Check for recent deployments
  • Review infrastructure changes

After Fix:

  • Verify fix in development
  • Add regression test
  • Document the issue
  • Deploy with monitoring
  • Confirm fix in production

Debugging Resources

Tools:

Best Practices:

Database: