905 lines
18 KiB
Markdown
905 lines
18 KiB
Markdown
# Backend Debugging Strategies
|
|
|
|
Comprehensive debugging techniques, tools, and best practices for backend systems (2025).
|
|
|
|
## Debugging Mindset
|
|
|
|
### The Scientific Method for Debugging
|
|
|
|
1. **Observe** - Gather symptoms and data
|
|
2. **Hypothesize** - Form theories about the cause
|
|
3. **Test** - Verify or disprove theories
|
|
4. **Iterate** - Refine understanding
|
|
5. **Fix** - Apply solution
|
|
6. **Verify** - Confirm fix works
|
|
|
|
### Golden Rules
|
|
|
|
1. **Reproduce first** - Debugging without reproduction is guessing
|
|
2. **Simplify the problem** - Isolate variables
|
|
3. **Read the logs** - Error messages contain clues
|
|
4. **Check assumptions** - "It should work" isn't debugging
|
|
5. **Use scientific method** - Avoid random changes
|
|
6. **Document findings** - Future you will thank you
|
|
|
|
## Logging Best Practices
|
|
|
|
### Structured Logging
|
|
|
|
**Node.js (Pino - Fastest)**
|
|
```typescript
|
|
import pino from 'pino';
|
|
|
|
const logger = pino({
|
|
level: process.env.LOG_LEVEL || 'info',
|
|
transport: {
|
|
target: 'pino-pretty',
|
|
options: { colorize: true }
|
|
}
|
|
});
|
|
|
|
// Structured logging with context
|
|
logger.info({ userId: '123', action: 'login' }, 'User logged in');
|
|
|
|
// Error logging with stack trace
|
|
try {
|
|
await riskyOperation();
|
|
} catch (error) {
|
|
logger.error({ err: error, userId: '123' }, 'Operation failed');
|
|
}
|
|
```
|
|
|
|
**Python (Structlog)**
|
|
```python
|
|
import structlog
|
|
|
|
logger = structlog.get_logger()
|
|
|
|
# Structured context
|
|
logger.info("user_login", user_id="123", ip="192.168.1.1")
|
|
|
|
# Error with exception
|
|
try:
|
|
risky_operation()
|
|
except Exception as e:
|
|
logger.error("operation_failed", user_id="123", exc_info=True)
|
|
```
|
|
|
|
**Go (Zap - High Performance)**
|
|
```go
|
|
import "go.uber.org/zap"
|
|
|
|
logger, _ := zap.NewProduction()
|
|
defer logger.Sync()
|
|
|
|
// Structured fields
|
|
logger.Info("user logged in",
|
|
zap.String("user_id", "123"),
|
|
zap.String("ip", "192.168.1.1"),
|
|
)
|
|
|
|
// Error logging
|
|
if err := riskyOperation(); err != nil {
|
|
logger.Error("operation failed",
|
|
zap.Error(err),
|
|
zap.String("user_id", "123"),
|
|
)
|
|
}
|
|
```
|
|
|
|
### Log Levels
|
|
|
|
| Level | Purpose | Example |
|
|
|-------|---------|---------|
|
|
| **TRACE** | Very detailed, dev only | Request/response bodies |
|
|
| **DEBUG** | Detailed info for debugging | SQL queries, cache hits |
|
|
| **INFO** | General informational | User login, API calls |
|
|
| **WARN** | Potential issues | Deprecated API usage |
|
|
| **ERROR** | Error conditions | Failed API calls, exceptions |
|
|
| **FATAL** | Critical failures | Database connection lost |
|
|
|
|
### What to Log
|
|
|
|
**✅ DO LOG:**
|
|
- Request/response metadata (not bodies in prod)
|
|
- Error messages with context
|
|
- Performance metrics (duration, size)
|
|
- Security events (login, permission changes)
|
|
- Business events (orders, payments)
|
|
|
|
**❌ DON'T LOG:**
|
|
- Passwords or secrets
|
|
- Credit card numbers
|
|
- Personal identifiable information (PII)
|
|
- Session tokens
|
|
- Full request bodies in production
|
|
|
|
## Debugging Tools by Language
|
|
|
|
### Node.js / TypeScript
|
|
|
|
**1. Chrome DevTools (Built-in)**
|
|
```bash
|
|
# Run with inspect flag
|
|
node --inspect-brk app.js
|
|
|
|
# Open chrome://inspect in Chrome
|
|
# Set breakpoints, step through code
|
|
```
|
|
|
|
**2. VS Code Debugger**
|
|
```json
|
|
// .vscode/launch.json
|
|
{
|
|
"version": "0.2.0",
|
|
"configurations": [
|
|
{
|
|
"type": "node",
|
|
"request": "launch",
|
|
"name": "Debug Server",
|
|
"skipFiles": ["<node_internals>/**"],
|
|
"program": "${workspaceFolder}/src/index.ts",
|
|
"preLaunchTask": "npm: build",
|
|
"outFiles": ["${workspaceFolder}/dist/**/*.js"]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**3. Debug Module**
|
|
```typescript
|
|
import debug from 'debug';
|
|
|
|
const log = debug('app:server');
|
|
const error = debug('app:error');
|
|
|
|
log('Starting server on port %d', 3000);
|
|
error('Failed to connect to database');
|
|
|
|
// Run with: DEBUG=app:* node app.js
|
|
```
|
|
|
|
### Python
|
|
|
|
**1. PDB (Built-in Debugger)**
|
|
```python
|
|
import pdb
|
|
|
|
def problematic_function(data):
|
|
# Set breakpoint
|
|
pdb.set_trace()
|
|
|
|
# Debugger commands:
|
|
# l - list code
|
|
# n - next line
|
|
# s - step into
|
|
# c - continue
|
|
# p variable - print variable
|
|
# q - quit
|
|
result = process(data)
|
|
return result
|
|
```
|
|
|
|
**2. IPython Debugger (Better)**
|
|
```python
|
|
from IPython import embed
|
|
|
|
def problematic_function(data):
|
|
# Drop into IPython shell
|
|
embed()
|
|
|
|
result = process(data)
|
|
return result
|
|
```
|
|
|
|
**3. VS Code Debugger**
|
|
```json
|
|
// .vscode/launch.json
|
|
{
|
|
"version": "0.2.0",
|
|
"configurations": [
|
|
{
|
|
"name": "Python: FastAPI",
|
|
"type": "python",
|
|
"request": "launch",
|
|
"module": "uvicorn",
|
|
"args": ["main:app", "--reload"],
|
|
"jinja": true
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Go
|
|
|
|
**1. Delve (Standard Debugger)**
|
|
```bash
|
|
# Install
|
|
go install github.com/go-delve/delve/cmd/dlv@latest
|
|
|
|
# Debug
|
|
dlv debug main.go
|
|
|
|
# Commands:
|
|
# b main.main - set breakpoint
|
|
# c - continue
|
|
# n - next line
|
|
# s - step into
|
|
# p variable - print variable
|
|
# q - quit
|
|
```
|
|
|
|
**2. VS Code Debugger**
|
|
```json
|
|
// .vscode/launch.json
|
|
{
|
|
"version": "0.2.0",
|
|
"configurations": [
|
|
{
|
|
"name": "Launch Package",
|
|
"type": "go",
|
|
"request": "launch",
|
|
"mode": "debug",
|
|
"program": "${workspaceFolder}"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Rust
|
|
|
|
**1. LLDB/GDB (Native Debuggers)**
|
|
```bash
|
|
# Build with debug info
|
|
cargo build
|
|
|
|
# Debug with LLDB
|
|
rust-lldb ./target/debug/myapp
|
|
|
|
# Debug with GDB
|
|
rust-gdb ./target/debug/myapp
|
|
```
|
|
|
|
**2. VS Code Debugger (CodeLLDB)**
|
|
```json
|
|
// .vscode/launch.json
|
|
{
|
|
"version": "0.2.0",
|
|
"configurations": [
|
|
{
|
|
"type": "lldb",
|
|
"request": "launch",
|
|
"name": "Debug",
|
|
"program": "${workspaceFolder}/target/debug/myapp",
|
|
"args": [],
|
|
"cwd": "${workspaceFolder}"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Database Debugging
|
|
|
|
### SQL Query Debugging (PostgreSQL)
|
|
|
|
**1. EXPLAIN ANALYZE**
|
|
```sql
|
|
-- Show query execution plan and actual timings
|
|
EXPLAIN ANALYZE
|
|
SELECT u.name, COUNT(o.id) as order_count
|
|
FROM users u
|
|
LEFT JOIN orders o ON u.id = o.user_id
|
|
WHERE u.created_at > '2024-01-01'
|
|
GROUP BY u.id, u.name
|
|
ORDER BY order_count DESC
|
|
LIMIT 10;
|
|
|
|
-- Look for:
|
|
-- - Seq Scan on large tables (missing indexes)
|
|
-- - High execution time
|
|
-- - Large row estimates
|
|
```
|
|
|
|
**2. Enable Slow Query Logging**
|
|
```sql
|
|
-- PostgreSQL configuration
|
|
ALTER DATABASE mydb SET log_min_duration_statement = 1000; -- Log queries >1s
|
|
|
|
-- Check slow queries
|
|
SELECT query, calls, total_exec_time, mean_exec_time
|
|
FROM pg_stat_statements
|
|
ORDER BY mean_exec_time DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
**3. Active Query Monitoring**
|
|
```sql
|
|
-- See currently running queries
|
|
SELECT pid, now() - query_start as duration, query, state
|
|
FROM pg_stat_activity
|
|
WHERE state = 'active'
|
|
ORDER BY duration DESC;
|
|
|
|
-- Kill a long-running query
|
|
SELECT pg_terminate_backend(pid);
|
|
```
|
|
|
|
### MongoDB Debugging
|
|
|
|
**1. Explain Query Performance**
|
|
```javascript
|
|
db.users.find({ email: 'test@example.com' }).explain('executionStats')
|
|
|
|
// Look for:
|
|
// - totalDocsExamined vs nReturned (should be close)
|
|
// - COLLSCAN (collection scan - needs index)
|
|
// - executionTimeMillis (should be low)
|
|
```
|
|
|
|
**2. Profile Slow Queries**
|
|
```javascript
|
|
// Enable profiling for queries >100ms
|
|
db.setProfilingLevel(1, { slowms: 100 })
|
|
|
|
// View slow queries
|
|
db.system.profile.find().limit(5).sort({ ts: -1 }).pretty()
|
|
|
|
// Disable profiling
|
|
db.setProfilingLevel(0)
|
|
```
|
|
|
|
### Redis Debugging
|
|
|
|
**1. Monitor Commands**
|
|
```bash
|
|
# See all commands in real-time
|
|
redis-cli MONITOR
|
|
|
|
# Check slow log
|
|
redis-cli SLOWLOG GET 10
|
|
|
|
# Set slow log threshold (microseconds)
|
|
redis-cli CONFIG SET slowlog-log-slower-than 10000
|
|
```
|
|
|
|
**2. Memory Analysis**
|
|
```bash
|
|
# Memory usage by key pattern
|
|
redis-cli --bigkeys
|
|
|
|
# Memory usage details
|
|
redis-cli INFO memory
|
|
|
|
# Analyze specific key
|
|
redis-cli MEMORY USAGE mykey
|
|
```
|
|
|
|
## API Debugging
|
|
|
|
### HTTP Request Debugging
|
|
|
|
**1. cURL Testing**
|
|
```bash
|
|
# Verbose output with headers
|
|
curl -v https://api.example.com/users
|
|
|
|
# Include response headers
|
|
curl -i https://api.example.com/users
|
|
|
|
# POST with JSON
|
|
curl -X POST https://api.example.com/users \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"name":"John","email":"john@example.com"}' \
|
|
-v
|
|
|
|
# Save response to file
|
|
curl https://api.example.com/users -o response.json
|
|
```
|
|
|
|
**2. HTTPie (User-Friendly)**
|
|
```bash
|
|
# Install
|
|
pip install httpie
|
|
|
|
# Simple GET
|
|
http GET https://api.example.com/users
|
|
|
|
# POST with JSON
|
|
http POST https://api.example.com/users name=John email=john@example.com
|
|
|
|
# Custom headers
|
|
http GET https://api.example.com/users Authorization:"Bearer token123"
|
|
```
|
|
|
|
**3. Request Logging Middleware**
|
|
|
|
**Express/Node.js:**
|
|
```typescript
|
|
import morgan from 'morgan';
|
|
|
|
// Development
|
|
app.use(morgan('dev'));
|
|
|
|
// Production (JSON format)
|
|
app.use(morgan('combined'));
|
|
|
|
// Custom format
|
|
app.use(morgan(':method :url :status :response-time ms - :res[content-length]'));
|
|
```
|
|
|
|
**FastAPI/Python:**
|
|
```python
|
|
from fastapi import Request
|
|
import time
|
|
|
|
@app.middleware("http")
|
|
async def log_requests(request: Request, call_next):
|
|
start_time = time.time()
|
|
response = await call_next(request)
|
|
duration = time.time() - start_time
|
|
|
|
logger.info(
|
|
"request_processed",
|
|
method=request.method,
|
|
path=request.url.path,
|
|
status_code=response.status_code,
|
|
duration_ms=duration * 1000
|
|
)
|
|
return response
|
|
```
|
|
|
|
## Performance Debugging
|
|
|
|
### CPU Profiling
|
|
|
|
**Node.js (0x)**
|
|
```bash
|
|
# Install
|
|
npm install -g 0x
|
|
|
|
# Profile application
|
|
0x node app.js
|
|
|
|
# Open flamegraph in browser
|
|
# Identify hot spots (red areas)
|
|
```
|
|
|
|
**Node.js (Clinic.js)**
|
|
```bash
|
|
# Install
|
|
npm install -g clinic
|
|
|
|
# CPU profiling
|
|
clinic doctor -- node app.js
|
|
|
|
# Heap profiling
|
|
clinic heapprofiler -- node app.js
|
|
|
|
# Event loop analysis
|
|
clinic bubbleprof -- node app.js
|
|
```
|
|
|
|
**Python (cProfile)**
|
|
```python
|
|
import cProfile
|
|
import pstats
|
|
|
|
# Profile function
|
|
profiler = cProfile.Profile()
|
|
profiler.enable()
|
|
|
|
# Your code
|
|
result = expensive_operation()
|
|
|
|
profiler.disable()
|
|
stats = pstats.Stats(profiler)
|
|
stats.sort_stats('cumulative')
|
|
stats.print_stats(10) # Top 10 functions
|
|
```
|
|
|
|
**Go (pprof)**
|
|
```go
|
|
import (
|
|
"net/http"
|
|
_ "net/http/pprof"
|
|
)
|
|
|
|
func main() {
|
|
// Enable profiling endpoint
|
|
go func() {
|
|
http.ListenAndServe("localhost:6060", nil)
|
|
}()
|
|
|
|
// Your application
|
|
startServer()
|
|
}
|
|
|
|
// Profile CPU
|
|
// go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
|
|
|
|
// Profile heap
|
|
// go tool pprof http://localhost:6060/debug/pprof/heap
|
|
```
|
|
|
|
### Memory Debugging
|
|
|
|
**Node.js (Heap Snapshots)**
|
|
```typescript
|
|
// Take heap snapshot programmatically
|
|
import { writeHeapSnapshot } from 'v8';
|
|
|
|
app.get('/debug/heap', (req, res) => {
|
|
const filename = writeHeapSnapshot();
|
|
res.send(`Heap snapshot written to ${filename}`);
|
|
});
|
|
|
|
// Analyze in Chrome DevTools
|
|
// 1. Load heap snapshot
|
|
// 2. Compare snapshots to find memory leaks
|
|
// 3. Look for detached DOM nodes, large arrays
|
|
```
|
|
|
|
**Python (Memory Profiler)**
|
|
```python
|
|
from memory_profiler import profile
|
|
|
|
@profile
|
|
def memory_intensive_function():
|
|
large_list = [i for i in range(1000000)]
|
|
return sum(large_list)
|
|
|
|
# Run with: python -m memory_profiler script.py
|
|
# Shows line-by-line memory usage
|
|
```
|
|
|
|
## Production Debugging
|
|
|
|
### Application Performance Monitoring (APM)
|
|
|
|
**New Relic**
|
|
```typescript
|
|
// newrelic.js
|
|
export const config = {
|
|
app_name: ['My Backend API'],
|
|
license_key: process.env.NEW_RELIC_LICENSE_KEY,
|
|
logging: { level: 'info' },
|
|
distributed_tracing: { enabled: true },
|
|
};
|
|
|
|
// Import at app entry
|
|
import 'newrelic';
|
|
```
|
|
|
|
**DataDog**
|
|
```typescript
|
|
import tracer from 'dd-trace';
|
|
|
|
tracer.init({
|
|
service: 'backend-api',
|
|
env: process.env.NODE_ENV,
|
|
version: '1.0.0',
|
|
logInjection: true
|
|
});
|
|
```
|
|
|
|
**Sentry (Error Tracking)**
|
|
```typescript
|
|
import * as Sentry from '@sentry/node';
|
|
|
|
Sentry.init({
|
|
dsn: process.env.SENTRY_DSN,
|
|
environment: process.env.NODE_ENV,
|
|
tracesSampleRate: 1.0,
|
|
});
|
|
|
|
// Capture errors
|
|
try {
|
|
await riskyOperation();
|
|
} catch (error) {
|
|
Sentry.captureException(error, {
|
|
user: { id: userId },
|
|
tags: { operation: 'payment' },
|
|
});
|
|
}
|
|
```
|
|
|
|
### Distributed Tracing
|
|
|
|
**OpenTelemetry (Vendor-Agnostic)**
|
|
```typescript
|
|
import { NodeSDK } from '@opentelemetry/sdk-node';
|
|
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
|
|
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
|
|
|
|
const sdk = new NodeSDK({
|
|
traceExporter: new JaegerExporter({
|
|
endpoint: 'http://localhost:14268/api/traces',
|
|
}),
|
|
instrumentations: [getNodeAutoInstrumentations()],
|
|
});
|
|
|
|
sdk.start();
|
|
|
|
// Traces HTTP, database, Redis automatically
|
|
```
|
|
|
|
### Log Aggregation
|
|
|
|
**ELK Stack (Elasticsearch, Logstash, Kibana)**
|
|
```yaml
|
|
# docker-compose.yml
|
|
version: '3'
|
|
services:
|
|
elasticsearch:
|
|
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
|
|
environment:
|
|
- discovery.type=single-node
|
|
ports:
|
|
- 9200:9200
|
|
|
|
logstash:
|
|
image: docker.elastic.co/logstash/logstash:8.11.0
|
|
volumes:
|
|
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
|
|
|
|
kibana:
|
|
image: docker.elastic.co/kibana/kibana:8.11.0
|
|
ports:
|
|
- 5601:5601
|
|
```
|
|
|
|
**Loki + Grafana (Lightweight)**
|
|
```yaml
|
|
# promtail config for log shipping
|
|
server:
|
|
http_listen_port: 9080
|
|
|
|
positions:
|
|
filename: /tmp/positions.yaml
|
|
|
|
clients:
|
|
- url: http://loki:3100/loki/api/v1/push
|
|
|
|
scrape_configs:
|
|
- job_name: system
|
|
static_configs:
|
|
- targets:
|
|
- localhost
|
|
labels:
|
|
job: backend-api
|
|
__path__: /var/log/app/*.log
|
|
```
|
|
|
|
## Common Debugging Scenarios
|
|
|
|
### 1. High CPU Usage
|
|
|
|
**Steps:**
|
|
1. Profile CPU (flamegraph)
|
|
2. Identify hot functions
|
|
3. Check for:
|
|
- Infinite loops
|
|
- Heavy regex operations
|
|
- Inefficient algorithms (O(n²))
|
|
- Blocking operations in event loop (Node.js)
|
|
|
|
**Node.js Example:**
|
|
```typescript
|
|
// ❌ Bad: Blocking event loop
|
|
function fibonacci(n) {
|
|
if (n <= 1) return n;
|
|
return fibonacci(n - 1) + fibonacci(n - 2); // Exponential time
|
|
}
|
|
|
|
// ✅ Good: Memoized or iterative
|
|
const memo = new Map();
|
|
function fibonacciMemo(n) {
|
|
if (n <= 1) return n;
|
|
if (memo.has(n)) return memo.get(n);
|
|
const result = fibonacciMemo(n - 1) + fibonacciMemo(n - 2);
|
|
memo.set(n, result);
|
|
return result;
|
|
}
|
|
```
|
|
|
|
### 2. Memory Leaks
|
|
|
|
**Symptoms:**
|
|
- Memory usage grows over time
|
|
- Eventually crashes (OOM)
|
|
- Performance degradation
|
|
|
|
**Common Causes:**
|
|
```typescript
|
|
// ❌ Memory leak: Event listeners not removed
|
|
class DataService {
|
|
constructor(eventBus) {
|
|
eventBus.on('data', (data) => this.processData(data));
|
|
// Listener never removed, holds reference to DataService
|
|
}
|
|
}
|
|
|
|
// ✅ Fix: Remove listeners
|
|
class DataService {
|
|
constructor(eventBus) {
|
|
this.eventBus = eventBus;
|
|
this.handler = (data) => this.processData(data);
|
|
eventBus.on('data', this.handler);
|
|
}
|
|
|
|
destroy() {
|
|
this.eventBus.off('data', this.handler);
|
|
}
|
|
}
|
|
|
|
// ❌ Memory leak: Global cache without limits
|
|
const cache = new Map();
|
|
function getCachedData(key) {
|
|
if (!cache.has(key)) {
|
|
cache.set(key, expensiveOperation(key)); // Grows forever
|
|
}
|
|
return cache.get(key);
|
|
}
|
|
|
|
// ✅ Fix: LRU cache with size limit
|
|
import LRU from 'lru-cache';
|
|
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 60 });
|
|
```
|
|
|
|
**Detection:**
|
|
```bash
|
|
# Node.js: Check heap size over time
|
|
node --expose-gc --max-old-space-size=4096 app.js
|
|
|
|
# Take periodic heap snapshots
|
|
# Compare snapshots in Chrome DevTools
|
|
```
|
|
|
|
### 3. Slow Database Queries
|
|
|
|
**Steps:**
|
|
1. Enable slow query log
|
|
2. Analyze with EXPLAIN
|
|
3. Add indexes
|
|
4. Optimize query
|
|
|
|
**PostgreSQL Example:**
|
|
```sql
|
|
-- Before: Slow full table scan
|
|
SELECT * FROM orders
|
|
WHERE user_id = 123
|
|
ORDER BY created_at DESC
|
|
LIMIT 10;
|
|
|
|
-- EXPLAIN shows: Seq Scan on orders
|
|
|
|
-- Fix: Add index
|
|
CREATE INDEX idx_orders_user_id_created_at
|
|
ON orders(user_id, created_at DESC);
|
|
|
|
-- After: Index Scan using idx_orders_user_id_created_at
|
|
-- 100x faster
|
|
```
|
|
|
|
### 4. Connection Pool Exhaustion
|
|
|
|
**Symptoms:**
|
|
- "Connection pool exhausted" errors
|
|
- Requests hang indefinitely
|
|
- Database connections at max
|
|
|
|
**Causes & Fixes:**
|
|
```typescript
|
|
// ❌ Bad: Connection leak
|
|
async function getUser(id) {
|
|
const client = await pool.connect();
|
|
const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
|
|
return result.rows[0];
|
|
// Connection never released!
|
|
}
|
|
|
|
// ✅ Good: Always release
|
|
async function getUser(id) {
|
|
const client = await pool.connect();
|
|
try {
|
|
const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
|
|
return result.rows[0];
|
|
} finally {
|
|
client.release(); // Always release
|
|
}
|
|
}
|
|
|
|
// ✅ Better: Use pool directly
|
|
async function getUser(id) {
|
|
const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]);
|
|
return result.rows[0];
|
|
// Automatically releases
|
|
}
|
|
```
|
|
|
|
### 5. Race Conditions
|
|
|
|
**Example:**
|
|
```typescript
|
|
// ❌ Bad: Race condition
|
|
let counter = 0;
|
|
|
|
async function incrementCounter() {
|
|
const current = counter; // Thread 1 reads 0
|
|
await doSomethingAsync(); // Thread 2 reads 0
|
|
counter = current + 1; // Thread 1 writes 1, Thread 2 writes 1
|
|
// Expected: 2, Actual: 1
|
|
}
|
|
|
|
// ✅ Fix: Atomic operations (Redis)
|
|
async function incrementCounter() {
|
|
return await redis.incr('counter');
|
|
// Atomic, thread-safe
|
|
}
|
|
|
|
// ✅ Fix: Database transactions
|
|
async function incrementCounter(userId) {
|
|
await db.transaction(async (trx) => {
|
|
const user = await trx('users')
|
|
.where({ id: userId })
|
|
.forUpdate() // Row-level lock
|
|
.first();
|
|
|
|
await trx('users')
|
|
.where({ id: userId })
|
|
.update({ counter: user.counter + 1 });
|
|
});
|
|
}
|
|
```
|
|
|
|
## Debugging Checklist
|
|
|
|
**Before Diving Into Code:**
|
|
- [ ] Read error message completely
|
|
- [ ] Check logs for context
|
|
- [ ] Reproduce the issue reliably
|
|
- [ ] Isolate the problem (binary search)
|
|
- [ ] Verify assumptions
|
|
|
|
**Investigation:**
|
|
- [ ] Enable debug logging
|
|
- [ ] Add strategic log points
|
|
- [ ] Use debugger breakpoints
|
|
- [ ] Profile performance if slow
|
|
- [ ] Check database queries
|
|
- [ ] Monitor system resources
|
|
|
|
**Production Issues:**
|
|
- [ ] Check APM dashboards
|
|
- [ ] Review distributed traces
|
|
- [ ] Analyze error rates
|
|
- [ ] Compare with previous baseline
|
|
- [ ] Check for recent deployments
|
|
- [ ] Review infrastructure changes
|
|
|
|
**After Fix:**
|
|
- [ ] Verify fix in development
|
|
- [ ] Add regression test
|
|
- [ ] Document the issue
|
|
- [ ] Deploy with monitoring
|
|
- [ ] Confirm fix in production
|
|
|
|
## Debugging Resources
|
|
|
|
**Tools:**
|
|
- Node.js: https://nodejs.org/en/docs/guides/debugging-getting-started/
|
|
- Chrome DevTools: https://developer.chrome.com/docs/devtools/
|
|
- Clinic.js: https://clinicjs.org/
|
|
- Sentry: https://docs.sentry.io/
|
|
- DataDog: https://docs.datadoghq.com/
|
|
- New Relic: https://docs.newrelic.com/
|
|
|
|
**Best Practices:**
|
|
- 12 Factor App Logs: https://12factor.net/logs
|
|
- Google SRE Book: https://sre.google/sre-book/table-of-contents/
|
|
- OpenTelemetry: https://opentelemetry.io/docs/
|
|
|
|
**Database:**
|
|
- PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
|
|
- MongoDB Performance: https://www.mongodb.com/docs/manual/administration/analyzing-mongodb-performance/
|