Controller/Operations center pods stuck in terminating state


When worker nodes crash controller/Operations center pods are not moved to a different worker node.
Controller/Operations center pods get stuck in terminating /unknown state.

 kubectl get pods -n $NAMESPACE -o wide | grep -i Terminating
ControlerTest1-0 1/1 Terminating 0 7d11h somenode1 <none> <none>
ControlerTest2-0 1/1 Terminating 0 7d11h somenode1 <none> <none>
ControlerTest3-0 1/1 Terminating 0 7d10h somenode1 <none> <none>



This happens when Kubernetes worker node loses connectivity to the API server. Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable.
The Pods running on an unreachable Node enter the ‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node.
In this case the pod still remains in the API server and hence a new pod is not scheduled since statefulset requires pod maintain a unique id within the cluster.


A pod in the terminating state will be removed through below actions. This should allow pod to be rescheduled as any record of it is removed from api server.

  • The Node object is deleted (either by you, or by the Node Controller).
  • The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver.
  • Force deletion of the Pod by the user (kubectl delete pods –grace-period=0 –force). This should be used as last resort.


