Understanding Kubernetes Architecture

Core Components

Kubernetes is composed of the control plane (API Server, Controller Manager, Scheduler, etcd) and node agents (kubelet, kube-proxy, container runtime). Failures often arise when control plane communication is broken or node conditions change unexpectedly.

Workload Deployment Model

Kubernetes workloads are deployed via declarative manifests using Deployments, StatefulSets, DaemonSets, and Jobs. Issues often occur due to misconfigured manifests, incorrect resource requests, or broken container images.

Common Kubernetes Issues

1. Pods Stuck in Pending State

Occurs due to lack of available resources, node taints, or unsatisfied affinity/anti-affinity rules.

2. CrashLoopBackOff or ImagePullBackOff Errors

Triggered by application-level failures, incorrect entrypoints, missing ConfigMaps/Secrets, or inaccessible container registries.

3. Persistent Volume Claims (PVC) Not Bound

Caused by mismatched StorageClass, unsupported access modes, or unprovisioned PersistentVolumes.

4. Ingress or Service Not Routing Traffic

Often due to misconfigured Ingress rules, missing annotations, incorrect service selectors, or CNI plugin/networking issues.

5. DNS Resolution Fails Inside Pods

Results from CoreDNS misconfiguration, network policy restrictions, or kubelet DNS overrides.

Diagnostics and Debugging Techniques

Use kubectl describe and logs

Retrieve detailed state of pods and workloads:

kubectl describe pod my-app-pod
kubectl logs my-app-pod -c app-container

Check Node Conditions and Scheduler Events

Identify resource pressure or taints preventing pod scheduling:

kubectl get nodes
kubectl describe node node-name

Test Service and Network Reachability

Use temporary pods and tools like nslookup, curl, and ping to debug DNS and network:

kubectl run -it --rm debug --image=busybox -- sh

Validate PVC and PV Relationships

Ensure PVC is requesting a matching StorageClass and access mode:

kubectl get pvc
kubectl describe pvc my-pvc

Inspect Events and Deployment Rollouts

Track rollouts and failed deployments:

kubectl rollout status deployment/my-app
kubectl get events --sort-by=.metadata.creationTimestamp

Step-by-Step Resolution Guide

1. Fix Pending Pods

Check if nodes have sufficient resources and no taints prevent scheduling:

kubectl describe pod pod-name
kubectl get nodes -o wide

Add tolerations or adjust resource requests as needed.

2. Resolve CrashLoopBackOff Errors

Inspect container logs and check startup commands. Confirm environment variables and secret volumes are mounted:

kubectl logs pod-name -c container-name
kubectl describe pod pod-name

3. Bind PVCs Correctly

Ensure StorageClass exists and supports the requested access mode:

kubectl get sc
kubectl describe pv pv-name

Adjust PVC or create a matching PV manually if needed.

4. Repair Ingress and Service Issues

Check Ingress controller is installed and running. Validate annotations and paths:

kubectl get ingress
kubectl describe ingress ingress-name

Verify that backend service and endpoints match selectors.

5. Restore DNS Resolution

Check CoreDNS logs and ConfigMap:

kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl edit configmap coredns -n kube-system

Ensure pods can access 10.96.0.10 or the cluster DNS IP.

Best Practices for Kubernetes Operations

  • Use readiness and liveness probes to improve availability and observability.
  • Use kubectl diff and kubectl apply --server-side to manage declarative resources safely.
  • Label resources consistently to improve targeting and maintenance.
  • Use resource limits and requests to control scheduling and prevent noisy neighbor issues.
  • Monitor with Prometheus, Grafana, and alert on common symptoms like pod restarts or high CPU usage.

Conclusion

Kubernetes enables powerful orchestration at scale but demands disciplined operations and in-depth understanding of its primitives. Most production issues stem from misconfigured workloads, misaligned resources, or overlooked environment-specific constraints. With effective use of kubectl, logging, and resource validation, DevOps teams can resolve common pitfalls and ensure resilient Kubernetes clusters.

FAQs

1. Why is my pod stuck in Pending state?

Check for insufficient resources or node taints. Use kubectl describe pod to identify scheduling constraints.

2. What causes CrashLoopBackOff in Kubernetes?

Usually an application failure on startup, invalid commands, or missing dependencies. Review logs and init containers.

3. Why isn't my PVC binding to a PV?

The PVC may request an unsupported access mode or nonexistent StorageClass. Ensure compatibility between PVC and PV.

4. My Ingress is not routing traffic—why?

Ingress controller may not be deployed, or rules/annotations are misconfigured. Also verify DNS and service endpoints.

5. How do I fix DNS resolution issues in pods?

Inspect CoreDNS logs, verify pod resolv.conf contents, and ensure network policies or iptables rules aren't blocking DNS access.