How To Increase CloudBees High Availability Timeout

Issue

Environment

Description

Often times HA failover is a sign of an underlying issue. Commonly, long running GC Cycles that last longer than the default timeout (10s for versions lower than 2.303.2.5, 30s for version 2.303.2.5 and greater) can be at the root cause of these issues. Therefore, following the Best Practices is a must.

If you are suffering HA Failover too often, we encourage you to Submit a Support Request so we can diagnose the root cause.

The CloudBees High Availability (HA) Plugin utilizes jgroups which has a configurable jgroups.xml file that can live inside of ${JENKINS_HOME} If you do not have the file, you can download its reference version from Amazon S3.

The following <FD> node within jgroups.xml is what determines the timeout period before failover. It essentially works like: timeout*max_tries (+ verify_suspect). Therefore, with the default settings:

<FD timeout="3000" max_tries="3"/><VERIFY_SUSPECT timeout="1500"/>

3000*3(+1500) = ~10seconds

Resolution

If you have an immediate or outlying need to increase the timeout, you can increase the values:

<FD timeout="3000" max_tries="10"/><VERIFY_SUSPECT timeout="1500"/>

3000*9(+1500) = ~30seconds

This 30 second timeout became the default timeout in relase 2.303.2.5 with change Increased High Availability (HA) default timeout (BEE-106).

Have more questions?

0 Comments

Please sign in to leave a comment.