- Operations Center does not respond and gets automatically restarted
- Managed Masters do not respond and get automatically restarted
- CloudBees Jenkins Enterprise
- CloudBees Jenkins Enterprise - Managed Master
- CloudBees Jenkins Enterprise - Operations Center
When there is a performance issue in either CJOC or a Managed Master, it is likely the application will not respond to the health checks sent by Marathon, which means that the application will be automatically re-deployed by Marathon after the Grace Period defined in the CJOC or the Managed Master configuration.
For Managed Masters, the Grace Period can be configured in the Managed Master configuration under the section Health check. Under any performance issue, it is recommended that we significantly increase the Grace Period so in case there is a performance issue, we are able to actually grab the information we need to troubleshoot the issue before it gets automatically restarted by Marathon.
For Operations Center, the grace period can’t be configure with neither the CJE CLI or from the UI. To configure the Grace Period, you could go to Marathon UI,
cjoc application and click on the Edit button under the Configuration Tab. In the Health checks section you can increase the Grace Period. To apply the changes click on the blue button called Change and deploy configuration. This change will redeploy CJOC, so a downtime should be expected.
Once the problem is exposed, it is time to take several thread dumps with the command
cje run support-performance <mm_name> 300 5 : i.e
cje run support-performance cjoc 300 5. In case the operation fails, you will need to actually going into the Docker container and manually run
jstack -l <JENKINS_PID>. We suggest to take at least 5 thread dumps when the issue is exposed. Attach the collected data into the ticket you have opened with CloudBees.
After the problem is exposed, generate a support bundle within the UI or the latest one you can find under
When the problem is exposed generate a CJE support bundle with the command
cje prepare pse-support attaching ALL the controllers and the worker where the instance with the performance issue is running.
The latest update of this article was tested with: