Windows JNLP Agents try to reconnect periodically

Issue

  • The Jenkins logs show lots of message like the following, about every 10 seconds:
2017-08-29 21:09:27.724+0000 [id=...]	INFO   	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted connection #<connectionId> from <remoteSocketAddress>
  • The agent logs show lots of warning like the following, about every 10 seconds:
Aug 29, 2017 9:09:27 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server didn't accept the handshake: <agentName> is already connected to this master. Rejecting this connection.

Environment

  • CloudBees Jenkins Enterprise - Operations Center (CJEOC)
  • CloudBees Jenkins Platform - Client Master (CJOC)
  • CloudBees Jenkins Enterprise - Managed Master (CJEMM)
  • CloudBees Jenkins Platform - Client Master (CJPCM)
  • CloudBees Jenkins Team (CJT)
  • Jenkins LTS
  • WinSW

Resolution

When there is a runaway process of a Windows JNLP agent or that a Windows JNLP agent is outdated, the agent process may try to reconnect every 10 seconds although it may already be connected. In such situation, the master logs are spammed with the connection logs like the following:

2017-08-29 21:09:27.724+0000 [id=...]	INFO   	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted connection #<connectionId> from <remoteSocketAddress>
2017-08-29 21:09:27.738-0000 [id=...]	WARNING	j.slaves.JnlpSlaveHandshake#error: TCP slave agent connection handler #<connectionId> with <remoteSocketAddress> is aborted: <agentName> is already connected to this master. Rejecting this connection.

Or since Jenkins 2.60.1 LTS:

2017-08-29 21:09:27.724+0000 [id=...]	INFO    h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted <protocol> connection #<connectionId> from <remoteSocketAddress>
2017-08-29 21:09:27.738-0000 [id=...]	WARNING	j.slaves.JnlpSlaveHandshake#error: TCP slave agent connection handler #<connectionId> with <remoteSocketAddress> is aborted: <agentName> is already connected to this master. Rejecting this connection.

If the connection fails instead of being rejected, you may see the following:

2017-08-29 21:09:27.724+0000 [id=...]	INFO   	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted connection #<connectionId> from <remoteSocketAddress>
2017-08-29 21:09:27.724+0000 [id=...]	WARNING	h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #<connectionId> failed
java.io.EOFException

These messages display useful information - like the agentName or the remoteSocketAddress - to track down which agents are at fault.

Fix the Agents

1. Ensure that the agent(s) impacted do not have a runaway process for jenkins-slave.exe

Lookout for jenkins-slave.exe processes. Killing the runaway process(es) or restarting the host should resolve this.

2. Ensure that the agent(s) impacted are up to date

You may encounter the issue after a Jenkins upgrade as Windows JNLP agents (and the service wrapper) are not upgraded automatically. Upgrading Windows agents and/or the Windows Service Wrapper should resolve this.

3. Configure the Windows Agent to prevent too frequent re-connections

In the agent configuration file jenkins-slave.xml, add the option -noReconnect to the startup command to prevent the agent to reconnect automatically and add an onfailure entry to control the restart of the service with a specific delay. For example:

<service>
  <id>agent</id>
  <name>JNLP Agent</name>
  <description>This service runs a agent for Jenkins continuous integration system.</description>
  <executable>C:\Program Files\Java\jre1.8.0_141\bin\java.exe</executable>
  <arguments>-Djsse.enableSNIExtension=false -Xrs  -jar "%BASE%\slave.jar" -noReconnect -jnlpUrl http://cjpcm.example.com/computer/windowsAgentJNLP/slave-agent.jnlp -secret XXXXXXXXXXXXXXXXXXXXXX</arguments>
  <logmode>rotate</logmode>
  <onfailure action="restart" delay="120 sec"/>
</service>

Improvement since 2.60.x

Since the release of the 2.60.1 LTS, the Windows Service Wrapper offers extension to kill runaway processes and automatically upgrade agents.

See also the article dedicated JNLP agents formerly slaves get disconnected

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.