Files
2026-04-12 01:06:31 +07:00

495 lines
10 KiB
Markdown

# Backend DevOps Practices
CI/CD pipelines, containerization, deployment strategies, and monitoring (2025).
## Deployment Strategies
### Blue-Green Deployment
**Concept:** Two identical environments (Blue = current, Green = new)
```
Production Traffic → Blue (v1.0)
Green (v2.0) ← Deploy & Test
Switch:
Production Traffic → Green (v2.0)
Blue (v1.0) ← Instant rollback available
```
**Pros:**
- Zero downtime
- Instant rollback
- Full environment testing before switch
**Cons:**
- Requires double infrastructure
- Database migrations complex
### Canary Deployment
**Concept:** Gradual rollout (1% → 5% → 25% → 100%)
```bash
# Kubernetes canary deployment
kubectl set image deployment/api api=myapp:v2
kubectl rollout pause deployment/api # Pause at initial replicas
# Monitor metrics, then continue
kubectl rollout resume deployment/api
```
**Pros:**
- Risk mitigation
- Early issue detection
- Real user feedback
**Cons:**
- Requires monitoring
- Longer deployment time
### Feature Flags (Progressive Delivery)
**Impact:** 90% fewer deployment failures when combined with canary
```typescript
import { LaunchDarkly } from 'launchdarkly-node-server-sdk';
const client = LaunchDarkly.init(process.env.LD_SDK_KEY);
// Check feature flag
const showNewCheckout = await client.variation('new-checkout', user, false);
if (showNewCheckout) {
return newCheckoutFlow(req, res);
} else {
return oldCheckoutFlow(req, res);
}
```
**Use Cases:**
- Gradual feature rollout
- A/B testing
- Kill switch for problematic features
- Decouple deployment from release
## Containerization with Docker
### Multi-Stage Builds (Optimize Image Size)
```dockerfile
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine
WORKDIR /app
# Copy only necessary files
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json ./
# Security: Run as non-root
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
USER nodejs
EXPOSE 3000
CMD ["node", "dist/main.js"]
```
**Benefits:**
- Smaller image size (50-90% reduction)
- Faster deployments
- Reduced attack surface
### Docker Compose (Local Development)
```yaml
version: '3.8'
services:
api:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
db:
image: postgres:15-alpine
environment:
- POSTGRES_PASSWORD=password
- POSTGRES_DB=myapp
volumes:
- postgres-data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
postgres-data:
```
## Kubernetes Orchestration
### Deployment Manifest
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-deployment
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myregistry/api:v1.0.0
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
```
### Horizontal Pod Autoscaling
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
## CI/CD Pipelines
### GitHub Actions (Modern, Integrated)
```yaml
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm run test:ci
- name: Upload coverage
uses: codecov/codecov-action@v3
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Snyk scan
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
- name: Container scan
run: |
docker build -t myapp:${{ github.sha }} .
docker scan myapp:${{ github.sha }}
deploy:
needs: [test, security]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Build and push Docker image
run: |
echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin
docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
docker push ghcr.io/${{ github.repository }}:${{ github.sha }}
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/api api=ghcr.io/${{ github.repository }}:${{ github.sha }}
kubectl rollout status deployment/api
```
## Monitoring & Observability
### Three Pillars of Observability
**1. Metrics (Prometheus + Grafana)**
```typescript
import { Counter, Histogram, register } from 'prom-client';
// Request counter
const httpRequestTotal = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status'],
});
// Response time histogram
const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'route'],
buckets: [0.1, 0.5, 1, 2, 5],
});
// Middleware to track metrics
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestTotal.inc({ method: req.method, route: req.route?.path, status: res.statusCode });
httpRequestDuration.observe({ method: req.method, route: req.route?.path }, duration);
});
next();
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
```
**2. Logs (ELK Stack - Elasticsearch, Logstash, Kibana)**
```typescript
import winston from 'winston';
import { ElasticsearchTransport } from 'winston-elasticsearch';
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [
new winston.transports.Console(),
new ElasticsearchTransport({
level: 'info',
clientOpts: { node: 'http://localhost:9200' },
index: 'logs',
}),
],
});
// Structured logging
logger.info('User created', {
userId: user.id,
email: user.email,
ipAddress: req.ip,
userAgent: req.headers['user-agent'],
});
```
**3. Traces (Jaeger/OpenTelemetry)**
```typescript
import { NodeSDK } from '@opentelemetry/sdk-node';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
const sdk = new NodeSDK({
traceExporter: new JaegerExporter({
endpoint: 'http://localhost:14268/api/traces',
}),
serviceName: 'api-service',
});
sdk.start();
// Traces automatically captured for HTTP requests, database queries, etc.
```
### Health Checks
```typescript
// Liveness probe - Is the app running?
app.get('/health/liveness', (req, res) => {
res.status(200).json({ status: 'ok', timestamp: Date.now() });
});
// Readiness probe - Is the app ready to serve traffic?
app.get('/health/readiness', async (req, res) => {
const checks = {
database: await checkDatabase(),
redis: await checkRedis(),
externalAPI: await checkExternalAPI(),
};
const isReady = Object.values(checks).every(Boolean);
res.status(isReady ? 200 : 503).json({
status: isReady ? 'ready' : 'not ready',
checks,
});
});
async function checkDatabase() {
try {
await db.query('SELECT 1');
return true;
} catch {
return false;
}
}
```
## Secrets Management
### HashiCorp Vault
```bash
# Store secret
vault kv put secret/myapp/db password=super-secret
# Retrieve secret
vault kv get -field=password secret/myapp/db
```
### Kubernetes Secrets
```yaml
apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
stringData:
url: postgresql://user:pass@host:5432/db
---
# Reference in deployment
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
```
## Infrastructure as Code (Terraform)
```hcl
# main.tf
resource "aws_db_instance" "main" {
identifier = "myapp-db"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.micro"
allocated_storage = 20
username = "admin"
password = var.db_password
backup_retention_period = 7
skip_final_snapshot = false
}
resource "aws_elasticache_cluster" "redis" {
cluster_id = "myapp-redis"
engine = "redis"
node_type = "cache.t3.micro"
num_cache_nodes = 1
parameter_group_name = "default.redis7"
}
```
## DevOps Checklist
- [ ] CI/CD pipeline configured (GitHub Actions/GitLab CI/Jenkins)
- [ ] Docker multi-stage builds implemented
- [ ] Kubernetes deployment manifests created
- [ ] Blue-green or canary deployment strategy
- [ ] Feature flags configured (LaunchDarkly/Unleash)
- [ ] Health checks (liveness + readiness probes)
- [ ] Monitoring: Prometheus + Grafana
- [ ] Logging: ELK Stack or similar
- [ ] Distributed tracing: Jaeger/OpenTelemetry
- [ ] Secrets management (Vault/AWS Secrets Manager)
- [ ] Infrastructure as Code (Terraform/CloudFormation)
- [ ] Autoscaling configured
- [ ] Backup and disaster recovery plan
## Resources
- **Kubernetes:** https://kubernetes.io/docs/
- **Docker:** https://docs.docker.com/
- **Prometheus:** https://prometheus.io/docs/
- **OpenTelemetry:** https://opentelemetry.io/docs/
- **Terraform:** https://www.terraform.io/docs/