Swarm disconnection due to already connected slave

Issue

  • Swarm slaves disconnects
  • Swarm slave already connected

Environment

  • Jenkins Enterprise
  • Swarm plugin < 1.22

Resolution

The swarm’s PluginImpl.doCreateSlave method will assign a node an alternative name if there is a name conflict, but it never tells the swarm client about the name change.

This means that the swarm client tries to connect using the name that it thinks it has, and then that connection gets rejected as there is already a slave connected with that name.

If you encounter swarm slave disconnects one place to look is the name of the swarm slave after generating a support bundle. In `nodes/slaves` folder you will see:

  • SLAVE1-10.0.0.1 (hudson.plugins.swarm.SwarmSlave)
  • SLAVE1-10.0.0.2 (hudson.plugins.swarm.SwarmSlave)
  • SLAVE1-10.0.0.3 (hudson.plugins.swarm.SwarmSlave)

while,

  • SLAVE1 (hudson.plugins.swarm.SwarmSlave)

is already connected and is the one with the actual connection. This can confused the Jenkins instance because the new slave name is not properly being returned to the swarm client.

In order to prevent this from happening the slave machines need to have a unique name every time the slave machine is created, otherwise the swarm client will not be able to properly determine the difference in slave names.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.