- A large volume of pods are waiting to be scheduled and waiting in the queue, pods going into failed and pending status :
Pod randomPodName marked as unschedulable can be scheduled on ip-XXX-XX-XX-XX.ec2.internal. Ignoring in scale up."
This is caused by an issue in Kubernetes and the cluster autoscaler prior to version 1.11.7
- For a temporary workaround the node reported can be tainted so kubernetes no longer schedules jobs on this node
Setting a taint to the node will prevent new pods from being scheduled there and after few seconds, the autoscaler should start scaling things properly.
- Long term resolution it is recommended to upgrade to Kubernetes 1.11.7 and Cluster Autoscaler supported version
following the guidelines listed in the related issue. Autoscaler fails to scale up nodes with pending pods