Why is my restored controller not coming up?

Issue

After restoring a controller, the controller is not responsive and you can see continuous errors in the logs as the ones shown below:

[Warning][Pod][controller-0][Unhealthy] Readiness probe failed: Get http://controller_url/login: dial tcp XXXX : connect: connection refused
...
...
[Warning][Pod][controller-0][Unhealthy] Liveness probe failed: HTTP probe failed with statuscode: 503

After several attempts, you manage to get the controller up and running.

Environment

Resolution

When restoring a controller, you have to consider that the startup should be slower than usual as this process will include the restoration of the backup. While the data is being restored, the controller is not accesible, and that will cause any liveness probe configured for the controller to fail and eventually force a restart event.

Considering the elements mentioned above, you will need to increase the elements related to the liveness probe in a way that the values selected exceed the restoration time. Thus the values needed would depend on the size of the backup and the resources allocated to the controller itself.

Please find below some reference values:

  • Health Check Initial Delay: Which is the “Number of seconds after the container has started before liveness probes are initiated.” from 600 (10 minutes) to 1800 (30 minutes).
  • Readiness Initial Delay: Which is the *“Number of seconds after the container has started before readiness probes are initiated.”*from 30 (30 seconds) to 1800 (30 minutes).

You might also consider increasing the resource allocation for the controller to make the restore operation faster. You should then increase the Memory and CPU.

Workaround

The workaround here is to wait for multiple restarts until the operation completes.

Tested product/plugin versions

References

You can also review How to Troubleshoot and Address Liveness / Readiness probe failure to get additional details on how to approach this issue in a more holistic way.

Have more questions?

0 Comments

Please sign in to leave a comment.