Files
english/.opencode/skills/devops/references/kubernetes-troubleshooting-advanced.md
2026-04-12 01:06:31 +07:00

1.4 KiB

Kubernetes Troubleshooting Advanced

Node Issues

kubectl describe node <node-name> | grep -A 5 "Conditions:"
kubectl top node <node-name>
kubectl top pods -A --sort-by=memory
kubectl drain <node-name> --ignore-daemonsets
kubectl uncordon <node-name>

CrashLoopBackOff

kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>
kubectl get pod <pod-name> -o yaml | grep -A 5 resources:

HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Anti-Patterns

Using latest tag:

# ❌ image: myapp:latest
# ✅ image: myapp:v1.2.3

Missing resources:

# ✅ Always set
resources:
  requests: { memory: "256Mi", cpu: "250m" }
  limits: { memory: "512Mi", cpu: "500m" }

Missing health checks:

livenessProbe:
  httpGet: { path: /health, port: 8080 }
readinessProbe:
  httpGet: { path: /ready, port: 8080 }

Running as root:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000

Monitoring

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring