Understanding Node Draining in GKE
How GKE Handles Node Upgrades
GKE automates node pool upgrades by cordoning, draining, and recreating nodes. Draining respects Kubernetes constraints like PodDisruptionBudgets, readiness probes, and affinity rules. If a pod cannot be evicted or rescheduled, the upgrade halts.
kubectl drain gke-node-name --ignore-daemonsets --delete-emptydir-data
Role of the Cluster Autoscaler
The GKE autoscaler may attempt to scale down underutilized nodes. It uses similar eviction logic but can stall if pods are deemed "non-evictable" due to configuration or runtime state.
Root Causes of Node Drain Failures
Misconfigured PodDisruptionBudgets (PDBs)
PDBs control how many replicas of a workload can be simultaneously unavailable. Overly strict PDBs (e.g., maxUnavailable: 0
) can block all evictions during upgrades or scale-downs.
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: strict-pdb spec: maxUnavailable: 0 selector: matchLabels: app: my-critical-service
Insufficient Scheduling Capacity
When a pod cannot be rescheduled due to taints, affinity rules, or resource exhaustion, the drain stalls. GKE does not force migration unless preemption or overprovisioning is in place.
StatefulSets and Node Affinity
Stateful workloads with persistent volume claims (PVCs) and hard node affinity may pin pods to specific nodes, making eviction and re-scheduling impossible within existing constraints.
Diagnostics and Observability Techniques
Use Events and Describe Outputs
Check node and pod events for eviction failures. Focus on messages like "Cannot evict pod" or "PodDisruptionBudget violation".
kubectl describe node gke-node-name kubectl get events --field-selector involvedObject.name=gke-node-name
Analyze Autoscaler Logs
Enable GKE autoscaler logging and review logs in Cloud Logging. Look for scale-down candidates rejected due to "no reschedule options" or "PDB violation" errors.
Simulate Drains with Dry-Run
Use dry-run mode to preview drain impact. This helps validate readiness probes, PDB logic, and rescheduling viability before rolling upgrades.
kubectl drain gke-node-name --dry-run=client --ignore-daemonsets
Step-by-Step Mitigation Strategy
- Audit all PodDisruptionBudgets and ensure they allow at least 1 pod disruption.
- Use
priorityClass
to allow preemption for critical workloads during rescheduling. - Provision buffer nodes or use overprovisioning with low-priority pods to absorb migration spikes.
- Avoid hard
nodeAffinity
unless absolutely necessary; preferpreferredDuringScheduling
. - Upgrade StatefulSets cautiously—consider partitioned rollouts and PVC re-attachment readiness.
Best Practices for Stable GKE Upgrades
- Run scheduled drain simulations weekly in staging environments.
- Regularly validate PDB logic against current replica counts.
- Use vertical and horizontal pod autoscalers to right-size workloads.
- Tag workloads with eviction-tolerant annotations to ease upgrades.
- Monitor drain duration metrics and flag any nodes exceeding 10 minutes during upgrade.
Conclusion
GKE node drain failures during autoscaling or upgrades are rooted in Kubernetes-native eviction mechanics, amplified by poor disruption planning or rigid workload definitions. Enterprise teams must proactively align their scheduling policies, disruption budgets, and affinity rules with GKE's orchestration patterns. By simulating drains, enabling autoscaler insights, and designing for graceful disruption, operations teams can avoid stalled rollouts, minimize downtime, and ensure continuous delivery in production-grade GKE environments.
FAQs
1. Why do node pools hang during GKE upgrades?
Likely due to PDB violations or unschedulable pods. Drain operations wait indefinitely if pods can't be evicted or rescheduled.
2. How do I detect which pod is blocking a node drain?
Use kubectl drain
with verbose output or describe node
to list non-evictable pods and corresponding error messages.
3. Can GKE force drain nodes even with PDBs in place?
No. GKE respects Kubernetes semantics and will not evict pods that violate active PDB constraints unless preemptible logic is implemented.
4. What’s the best way to handle StatefulSet upgrades?
Use rollingUpdate
strategy with partitioning and readiness gates. Ensure PVCs are managed by dynamically provisioned storage classes.
5. Should autoscaler scale up during a stuck node drain?
Only if configured with overprovisioning or buffer nodes. Otherwise, scale-up decisions may lag behind pending pod pressure.