How to Troubleshoot and Address Liveness / Readiness probe failure

Symptoms

  • Managed Master is failing, its container is being restarted and the Managed Master item log shows Liveness probe failed: HTTP probe failed with statuscode: 503 or Liveness probe failed: Get http://$POD_IP:8080/$MASTER_NAME/login: dial tcp POD_IP:8080: connect: connection refused
  • Managed Master is failing, its container is being restarted and the Managed Master item log shows Readiness probe failed: HTTP probe failed with statuscode: 503 or Readiness probe failed: Get http://$POD_IP:8080/$MASTER_NAME/login: dial tcp POD_IP:8080: connect: connection refused
  • Managed Master takes a long time to start and eventually fails due to the Liveness or readiness probe

Environment

Diagnostic/Treatment

Preconditions

Liveness / Readiness probe failure are caused by Jenkins being not responsive to a health check - currently done http://$POD_IP:8080/$MASTER_NAME/login. Those failures occurs when Jenkins suffers from performance issues and is unresponsive for too long. In most cases, this happens on startup.

Before troubleshooting any further, we recommend to go through the following recommendations that address common causes.

Review Resource Requirements

In containerized environment, it is important that Jenkins gets the resource it needs:

  • Ensure that appropriate container Memory and CPUs are given to the Master (see the “Jenkins Master Memory in MB” and “Jenkins Master CPUs” fields of the Managed Master configuration)

Note: see also the Master Sizing Guidelines

Review Startup Performances Preconditions

If the probe fails on startup, review also How to Troubleshoot and Address Jenkins Startup Performances - Preconditions

Workarounds

Liveness / Readiness probe failures suggest performances issues or slow startup. A quick workaround for such kind of issues is to update those probe to give more slack to Jenkins to start or be responsive. But the probe configuration we want to tweak depends on the nature of the problem: is it failing on startup or while Jenkins is running ?

A Probe fails on Startup

If a probe fails while a Managed Master is starting, a quick workaround is to give more time for Jenkins to start (Note that the Liveness probe failure is causing because if it fails it restarts the container).

Increase the Initial Delay of the Liveness Probe

To increase the Liveness probe initial delay, configure the Managed Master item and update the value of “Health Check Initial Delay”. By default it set to 600 (10 minutes). You may increase it to for example 1800 (30 minutes).

Increase the Failure Threshold of the Readiness Probe

To increase the readiness probe failure threshold, configure the Managed Master item and update the value of “Readiness Failure Threshold”. By default, it is set to 100 (100 times). You may increase it to, for example, 300.

A Probe fails while Jenkins is running

If a probe fails while a Managed Master is running, it is quite concerning as it suggests that the master was non responsive for minutes. In such cases, increasing the probes timeout can help to keep the unresponsive master up for a longer time so that we can collect data.

Increase the Timeout of the Liveness Probe

To increase the Liveness probe timeout, configure the Managed Master item and update the value of “Health Check Timeout”. By default it set to 10 (10 seconds). You may increase it to for example 30 (30 seconds).

Increase the Timeout of the Readiness Probe

To increase the Readiness probe timeout, configure the Managed Master item and update the value of “Readiness Timeout”. By default it set to 5 (5 seconds). You may increase it to for example 30 (30 seconds).

Data Collection

Although updating the probe configuration can help to get the master started, it is important to troubleshoot the root cause of the problem, which is usually related to performance.

Failure on startup

If a probe fails while the Managed Master is starting:

Failure while Jenkins is running

If a probe fails while the Managed Master is running:

References

Have more questions?

0 Comments

Please sign in to leave a comment.