Issue
- We would like to apply patches to worker / controller
- We would like to apply patches to worker / controller periodically
Environment
- CloudBees Jenkins Enterprise (CJE) - AWS/Anywhere
Resolution
Updating the OS images used in CJE requires a restart of the components, namely the controllers and workers. This may produce a lot of downtime. We recommend using the approach explained in the article Strategy for Rolling OS Upgrades.
When opting for live upgrades, CJE is not aware of the changes applied and this can be a problem, especially for AWS environments. At the very least, health checks similar to what CJE does must be conducted after the upgrade is performed.
This article is meant to provide recommendations/guidance on the process and actions to carry out.
AWS
(Note: in the case of AWS, changes applied live to workers/controller would vanish after a worker-restart
or a controller-restart
. These operations “reconstruct” instances based on an AMI ID. Live upgrade are therefore not recommended.)
Constraints:
- Ensure 2 controllers are available at all time
- (If a restart is required) Ensure that the IPs are preserved by using Reboot in EC2. Or if you are using Elastic IPs, you can do a Stop and Start of the EC2 instance. See Differences Between Reboot, Stop, and Terminate for more information.
Recommendations:
- Ensure you have a strategy to backup and restore instances if the upgrade goes wrong
Process:
- Perform live upgrade on Controllers one at a time - perform the following process for one controller and when it is validated that it works, carry on with the others. Otherwise, rollback the changes.
- Upgrade a controller
- (If a restart required) Reboot the instance
- Check required services are running in the controller (it may take few seconds for the services to start):
marathon
mesos-master
zookeeper
- Check that sub systems are running
docker
ntp
rsyslog
topbeat
- Check Mesos / Marathon UI is reachable
- Perform live upgrade on Worker(s) - perform the following process for one worker and when it is validated that it works, carry on with the others. Otherwise, rollback the changes.
- Upgrade one worker
- (If a restart required) Reboot the instance
- Check required services are running in the controller (it may take few seconds for the services to start):
mesos-slave
- Check that sub systems are running
docker
ntp
rsyslog
topbeat
- (If applicable) Perform live upgrade on Bastion
Anywhere
Constraints:
- Ensure 2 controllers are available at all time
- (If a restart is required) Ensure that the IPs / DNS Hostnames are preserved
Recommendations:
- Ensure you have a strategy to backup and restore instances if the upgrade goes wrong
Restart Required:
- Perform live upgrade on Controllers one at a time - perform the following process for one controller and when it is validated that it works, carry on with the others. Otherwise, rollback the changes.
- Upgrade a controller
- (If a restart required) Restart the instance
- Check required services are running in the controller (it may take few seconds for the services to start):
marathon
mesos-master
zookeeper
- Check that sub systems are running
docker
ntp
rsyslog
topbeat
- Check Mesos / Marathon UI is reachable
- Perform live upgrade on Worker(s) - perform the following process for one worker and when it is validated that it works, carry on with the others. Otherwise, rollback the changes.
- Upgrade one worker
- (If a restart required) Restart the instance
- Check required services are running in the controller (it may take few seconds for the services to start):
mesos-slave
- Check that sub systems are running
docker
ntp
rsyslog
topbeat
- (If applicable) Perform live upgrade on Bastion
Note: If the upgrade of the first controller / worker goes wrong. Stop right here and check what is wrong. Don’t perform the upgrade of other controllers / workers.
0 Comments