My Master will not get new Shared Agents / Shared Cloud Agents

Issue

  • My master will no longer provision shared agents or shared cloud agents from the Operations Center
  • On the Operations Center continue to lease out shared cloud agents / shared agents
  • When you try to force release a shared agent, another one will be leased
  • The master does not show any executor available
  • You can not delete the shared cloud / shared agents and add a new one because it is “In Use”
  • The Operations Center logs show an exception similar to the following:
2020-02-05 06:53:24.224+0000 [id=18029912]  WARNING c.c.o.s.p.SlaveLeaseTable#registerRequest: Failed to register request for owner: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with leaseId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
org.h2.jdbc.JdbcSQLException: [...]

Environment

Explanation

The problem may be caused by a corrupted database in the Operations Center (the database managing the shared agents leases). The stacktrace shown above is evidence of this. In some cases, this could be caused by the file system being unresponsive (when mounting $JENKINS_HOME in a shared file system for example).

Resolution

This should only be done if no other means can be found to reconnect the shared cloud / agents

  1. Stop the Jenkins master
  2. Stop the Jenkins OC
  3. Remove the file $JENKINS_HOME/run-time-state.h2.db
  4. Restart the Jenkins OC
  5. Observe that all the agents have disconnected completely from the master
  6. Restart the master

Note: If there were a lot of items in the queue when the master was stopped, and the master has troubles to come back up, try to clear the master’s queue:

  • Stop the master
  • Move / Remove the file $JENKINS_HOME/queue.xml
  • Start the master

Disk Contention Scenario

If the message of the JdbcSQLException reports an IOException such as the following, this could well be caused by I/O contention on disk:

org.h2.jdbc.JdbcSQLException: IO Exception: "java.io.IOException: Stream Closed"

When the $JENKINS_HOME of the Operations Center is mounted in a network file system such as NFS, make sure that the file system is responding and it is a supported NFS version. Then restart CJOC.

Have more questions?

0 Comments

Please sign in to leave a comment.