Stopped Executors will not release from the Master

Issue

  • Encountering “dead” threads often
  • Executors are being held and not returned to the slave pool
  • Executors are shown as stopped and do not release from the master
  • Getting java.lang.IllegalStateException on specific jobs

Environment

  • CloudBees Jenkins Enterprise

Resolution

This is generally caused by doing a Reload Configuration from Disk while jobs are being executed.

The best ways to avoid this happening again is either:

  1. Whenever you want to reload the configuration, make sure that you put Jenkins into quiet mode from the CLI before the reload until all the jobs are done and then after the reload you can cancel quiet mode again from the CLI
  2. Do not use Reload Configuration from Disk and instead restart the entire instance

If this problem is currently occuring and you have stopped exectutors then use this script which will go and restart any executors:

/*** BEGIN META {
 "name" : "Restart Dead Executors",
 "comment" : "Search for dead executors and throws away them and get a new ones.",
 "core": "1.609",
 "authors" : [
 { name : "Kuisathaverat" }
 ]
 } END META**/
 
import hudson.model.Node
import jenkins.model.Jenkins

Jenkins jenkins = Jenkins.instance
for (Node node in jenkins.nodes) {
  // Make sure slave is online
  if (!node.toComputer().online) {
    println "Node '$node.nodeName' is currently offline - skipping check"
  } else {
    props = node.toComputer().getSystemProperties();
    println "Node '$node.nodeName' is running ";
    //check if has executors dead
    for (Executor ex : node.toComputer().getExecutors()){
      Throwable cause = ex.getCauseOfDeath()
      if(cause instanceof Throwable){
        println '[Dead]' + cause
        ex.doYank()
      }
    }
  }
}

Also if you are having trouble building jobs because of the build number being incorrect this is a script to correct the current build numbers:

Jenkins.instance.getAllItems(Job.class).each { it ->
    // This does not work with matrix projects
    if (!(it instanceof hudson.matrix.MatrixConfiguration)) {
        def nextNumber;

        // If the next build number file does not exist then set it to one.
        try {
            nextNumber = it.getNextBuildNumberFile().read().trim();
        } catch (java.io.FileNotFoundException e) {
            nextNumber = 1;
        }

        println("Job: " + it.getFullName() + ". Next Build Number: " + nextNumber + ". ");
        println "Content of file: " + nextNumber;

        def largest = 1;
        def current;
        // For each build directory in the job builds dir
        it.getBuildDir().list().each { builds ->
            try {
                // Get the file name and try to convert to an integer.
                current = Integer.valueOf(new File(builds).getName());
                // if the current directory largest is > largest from memory
                if (current > largest) {
                    // replace largest from memory
                    largest = current;
                }
            } catch (e) {
                // If it fails to convert, move on.
            }
        }

        println "Largest: " + (largest + 1) + " Next Number: " + nextNumber;
        // If the next build number is > largest + 1
        if (!nextNumber.equals(largest + 1)) {
            // Update and save the next build number as the new largest
            it.updateNextBuildNumber(largest + 1);
            println "Wrote content " + it.nextBuildNumber + " to the job: " + it.getFullName();
        }
    } else {
        println "Ignore matrix projects";
    }
};

Both of those can be executed through the groovy script console inside of Manage Jenkins

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.