Troubleshooting performance issues on Marathon - Mesos environment

Issue

  • Operations Center does not respond and gets automatically restarted
  • Managed Masters do not respond and get automatically restarted

Environment

Resolution

Increasing the Grace Period to capture the diagnosis information

When there is a performance issue in either CJOC or a Managed Master, it is likely the application will not respond to the health checks sent by Marathon, which means that the application will be automatically re-deployed by Marathon after the Grace Period defined in the CJOC or the Managed Master configuration.

For Managed Masters, the Grace Period can be configured in the Managed Master configuration under the section Health check. Under any performance issue, it is recommended that we significantly increase the Grace Period so in case there is a performance issue, we are able to actually grab the information we need to troubleshoot the issue before it gets automatically restarted by Marathon.

For Operations Center, the grace period can’t be configure with neither the CJE CLI or from the UI. To configure the Grace Period, you could go to Marathon UI, cjoc application and click on the Edit button under the Configuration Tab. In the Health checks section you can increase the Grace Period. To apply the changes click on the blue button called Change and deploy configuration. This change will redeploy CJOC, so a downtime should be expected.

Take thread dumps

Once the problem is exposed, it is time to take several thread dumps with the command cje run support-performance <mm_name> 300 5 : i.e cje run support-performance cjoc 300 5. In case the operation fails, you will need to actually going into the Docker container and manually run jstack -l <JENKINS_PID>. We suggest to take at least 5 thread dumps when the issue is exposed. Attach the collected data into the ticket you have opened with CloudBees.

Generate a Support Bundle from the UI

After the problem is exposed, generate a support bundle within the UI or the latest one you can find under $JENKINS_HOME/support.

Generate a CJE Support Bundle

When the problem is exposed generate a CJE support bundle with the command cje prepare pse-support attaching ALL the controllers and the worker where the instance with the performance issue is running.

Tested products/plugins version

The latest update of this article was tested with:

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.