Issue
Under certain circumstances, Jenkins may “hang” with the following conditions:
- The Jenkins java process is running in a waiting state.
- Jenkins is effectively down.
- Nothing is logged.
Sometimes, after numerous restarts, the Jenkins service may start up again normally.
The root cause for this issue is that the Jenkins service hangs immediately before it forks the child process that starts Jetty and Jenkins. Although the Java process is running, nothing is logged, because Jenkins has not yet started and is not yet listening on any port.
Environment
NOTE: This issue affects a very small number of CloudBees customers. You only need to take action if you are directly affected by this issue: if you are not experiencing this issue, no action is necessary.
Customers most likely to be affected by this issue are:
- Using the
init.d
scripts with the--daemon
argument. - Running a 3.10.0 kernel.
- Using RPM-based distributions based on RHEL or CentOS 7.
The issue affects the following platforms:
- CloudBees CI (CloudBees Core) on traditional platforms
- CloudBees Jenkins Platform
- CloudBees Jenkins Distribution
- Jenkins LTS
Resolution
Upgrade to:
- CloudBees CI on traditional platforms 2.289.2.3 or newer, but we strongly recommend to upgrade to at least CloudBees CI on traditional platforms 2.303.3.3 to also get the fix to a regression affecting RPM JENKINS_JAVA_OPTIONS (BEE-9272).
- Jenkins LTS 2.332.1
We can help plan and partner with you on your upgrade, via an Assisted Update.
Workaround
To work around this issue:
-
Log into the server that is running your product instance.
-
Stop and back up your instance, as detailed in the Backup and Restore guide.
- On the server running the instance, look in the
/etc/init.d
directory for a file that matches your distribution:jenkins
for CloudBees Jenkins Platform Client controller instancesjenkins-oc
for CloudBees Jenkins Platform Operations Center instancescloudbees-jenkins-distribution
for CloudBees Jenkins Distribution instancescloudbees-core-cm
for CloudBees Core on Traditional Platforms Client controller instancescloudbees-core-oc
for CloudBees Core on Traditional Platforms Operations Center instances
-
Make a note of this filename. For example, for CloudBees Jenkins Distribution, the file is
/etc/init.d/cloudbees-jenkins-distribution
. -
In the file matching your distribution, search for the line reading
-Dcb.distributable.commit_sha=[hashToUse]
. Make a note of thehashToUse
value. -
In a temporary directory, create a new file and populate it with the content in the file https://cloudbees-jenkins-scripts.s3.amazonaws.com/e206a5-linux/systemInit.sh.
- In the file you just created, find all occurrences of the string
@@ARTIFACTNAME@@
, and replace them with your installed product identifier, where product identifier is one of:jenkins
for CloudBees Jenkins Platform Client controller instancesjenkins-oc
for CloudBees Jenkins Platform Operations Center instancescloudbees-jenkins-distribution
for CloudBees Jenkins Distribution instancescloudbees-core-cm
for CloudBees Core on Traditional Platforms Client controller instancescloudbees-core-oc
for CloudBees Core on Traditional Platforms Operations Center instances
-
Locate and replace all occurrences of the string
@@COMMIT_SHA@@
with the value ofhashToUse
you previously looked up. -
Back up the
/etc/init.d
file matching your distribution, and move it to a safe location. -
If you have modified the
/etc/init.d
file, like adding new Java params or removing them, make sure the same modifications are contained in the new script file. -
Make the script file you created earlier executable (if needed) and execute it.
-
This action should generate a .service file with the name of your distribution. For example, for CloudBees Jenkins Distribution, the resulting file should be
cloudbees-jenkins-distribution.service
in the local directory. -
Copy the newly-generated file into the
/etc/systemd/system
folder. - Restart the instance using the
systemctl
command:
systemctl daemon-reload && systemctl start <name of service>
Troubleshooting
If you want to roll back the changes:
- Remove the
/etc/systemd/system
file you previously added. - Restore the
/etc/init.d
script that you saved to an alternate location. - Restart your system using the command:
systemctl daemon-reload
Background
The RPM packaging for RHEL and CentOS systems calls the OS daemon
command from the /etc/init.d scripts used to manage the instance.
Jenkins-based products used the Akuma
library, invoked with the --daemon
flag), to make Jenkins daemonize itself in a way that the OS daemon
command could handle properly.
On recent versions of RHEL and CentOS operating systems, the Akuma library no longer worked as expected upon restarts, and instead could cause an infinite loop while attempting to daemonize the instance.
One possible workaround was to simply remove the --daemon
flag, but this was an incomplete solution: it fixed the error but caused the underlying /etc/init.d
script to become nonfunctional.
The better solution is to use a native systemd
service file to manage the daemon, by upgrading to our newer product releases, as per the “Resolution” section of this article.
3 Comments