dedicated agents are not able to connect

Issue

Your inbound (formerly known as “JNLP”) build agent is failing to connect to your Jenkins controller.

Environment

Resolution

Since CloudBees Core 2.222.1.1 it is possible to connect your inbound agents using WebSocket transport instead of TCP, which simplifies the connectivity at the infrastructure level getting the same result with less complexity. WebSockets are the recommended approach especially in Kubernetes environments where it removes the need to open up and manage nodePorts in your Kubernertes cluster.

[Recommended approach] Use WebSocket transport agents

Starting in Jenkins core 2.217 the WebSocket feature landed in Jenkins. This improvement provides WebSocket transport agent support to Jenkins, available when connecting inbound agents or when running the CLI. The WebSocket protocol allows bidirectional, streaming communication over an HTTP(S) port.

Since CloudBees Core 2.222.1.1 you can use WebSocket transport to connect inbound agents, and this works as well for shared agents / clouds. Just select the WebSocket checkbox in agent / cloud configuration. No special network configuration is needed, since the regular HTTP(S) port proxied by the CloudBees CI ingress is used for all communications.

Use WebSocket checkbox

The main benefit of WebSocket is that it simplifies the connectivity as you will not need to configure the inbound TCP port in the networking elements in front of Jenkins for the connectivity to happen. Websocket is compatible with HTTP/HTTPS protocol so it uses the Jenkins URL for the communication.

Troubleshooting WebSocket transport agents

If your agent uses WebSocket transport and you are encountering unexpected agent disconnections, you can add a custom logger for the class jenkins.agents.WebSocketAgents by following Configure Loggers for Jenkins.

WebSocket agent connectivity can also be traced by using a third party tool (if your company approves of usage of this third party tool) from https://github.com/vi/websocat by running the following command, piped to the ts command (which comes from the Linux moreutils package) to prefix timestamps to each line of output:

websocat -vv --basic-auth "${USER}:${API_TOKEN}"  wss://${JENKINS_URL}/wsecho/ 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'

Note: The ${USER} must be a Jenkins administrator to use this command, and for the URL, take the current URL you use to access Jenkins (for example https://JENKINS_URL/) and change it to wss://JENKINS_URL/wsecho/.

This command will establish a WebSocket connection, and list the ping and pong messages for the socket, prefixed with timestamps so issues can be traced:

[2021-07-23 15:07:56] [INFO  websocat::ws_client_peer] get_ws_client_peer
[2021-07-23 15:07:57] [INFO  websocat::ws_client_peer] Connected to ws
[2021-07-23 15:07:57] [DEBUG websocat::ws_peer] Starting pinger
[2021-07-23 15:08:12] [INFO  websocat::ws_peer] Received WebSocket ping
[2021-07-23 15:08:27] [INFO  websocat::ws_peer] Sending WebSocket ping
[2021-07-23 15:08:27] [INFO  websocat::ws_peer] Received a pong from websocket

If you encounter disconnections, you should also review the logs of your current ingress controller, to see if the WebSocket connnection issues are seen in your ingress controller logs (for example NGINX).

The default WebSocket ping interval is 30 seconds, as per this code: https://github.com/jenkinsci/jenkins/blob/5c9976617cd6512c0d265b0e8a0623307f8d40bb/core/src/main/java/jenkins/websocket/WebSocketSession.java#L47-L58
Most ingress controllers expect a ping every 60 seconds in order to consider the connection active, so if you have a different ingress controller timeout, or if your build agents become unresponsive due to heavy build workload and are not able to send the ping in time, intermittent disconnections can happen. If you are encountering intermittent disconnections, you can try to increase the frequency of the ping interval from 30 seconds to 10 seconds by setting the startup option:

-Djenkins.websocket.pingInterval=10

The steps to add this startup option are:

[Alternative approach if WebSocket does not work] Verify required settings for inbound agents

Using WebSocket transport makes it much easier to handle special network topologies including reverse proxies and Kubernetes ingress. But if your agents are on the same network as the controller, TCP inbound agents are an excellent choice and fully supported.
In order to successfully connect an inbound agent with your Jenkins environment there are a few important pre-requisites:

  • The CloudBees CI instance must be listening on the TCP port used mainly for non-WebSocket inbound agents
  • The CloudBees CI instance must be reachable at HTTP level from the agent
  • The CloudBees CI instance must be reachable at TCP level from the agent

The CloudBees CI instance must be listening on the inbound (formerly JNLP) port

Go to Manage Jenkins -> Configure Global Security and ensure that the inbound port was configured with an either fixed or random port, and that the Agent protocol Inbound TCP Agent Protocol/4 (TLS encryption) is at least enabled.

Take a thread dump of the instance going to <JENKINS_URL>/threadDump from your web browser and look for the TCP agent listener thread.

TCP agent listener port=31966
"TCP agent listener port=31966" Id=89 Group=main RUNNABLE (in native)
	at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
	-  locked java.lang.Object@7b1b2a2
	at hudson.TcpSlaveAgentListener.run(TcpSlaveAgentListener.java:186)

The CloudBees CI instance must be reachable at HTTP level from the agent

From the agent side run the command curl -ILv <JENKINS_URL> and check if you are getting the Jenkins headers such as:

...
< X-Hudson: 1.395
X-Hudson: 1.395
< X-Hudson-CLI-Port: 31966
X-Hudson-CLI-Port: 31966
< X-Jenkins: 2.204.1.3
X-Jenkins: 2.204.1.3
< X-Jenkins-CLI-Host: ec2-74-159-31-69.compute-1.amazonaws.com
X-Jenkins-CLI-Host: ec2-74-159-31-69.compute-1.amazonaws.com
...

The CloudBees CI instance must be reachable at TCP level from the agent

You can test if the CliudBees Core instance is reachable through TCP protocol by either telnet <JENKINS_HOSTNAME> <INBOUND_PORT> or curl <JENKINS_HOSTNAME>:<INBOUND_PORT>.

Ensure that the Java version is at least on the same line on both controller and agent

A good practice is to run the exactly same Java version in both Jenkins and agent, but when this is not possible it is recommended to be running at least the same base line.

Run java -version in both Jenkins controller machine and agent to check the java version you are running in both.

Ensure that the version of agent.jar matches with the one

The main problem of running an inbound agents as an agent Launcher is that when you upgrade Jenkins agent.jar is not automatically upgraded on the agent - which by the way happens in SSH Launcher out of the box.

Check that agent.jar is the same using for example md5sum agent.jar. agent.jar can be downloaded from Jenkins controller from the URL below:

<JENKINS_URL>/jnlpJars/agent.jar

Use jenkins-cli to check the connection

In the agent box download <JENKINS_URL>/jnlpJars/jenkins-cli.jar from Jenkins controller and execute the command below:

java -jar jenkins-cli.jar -s http://<CJOC_URL>/ --username=<USERNAME> --password=<PASSWORD> help

Check that the inbound port and hostname are right

Launch the commands below and check that the port and hostname are the right ones:

curl -I <JENKINS_URL>/computer/<AGENT>/jenkins-agent.jnlp
curl -I <JENKINS_URL>/tcpSlaveAgentListener/

Curl command can be available on a Windows box using for example curl Download Wizard

Load balancer or ha-proxy

If you are using a load balancer or a ha-proxy and you are not running Jenkins on ha mode, you might want to bypass any of them through the Agent advance option of Tunnel connection through.

Clear the Java Web Start Cache

If, when starting the JNLP file, you see an error like the one below, run the command javaws -clearcache to clear the cache of the java webstart program.

java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(Unknown Source)
	at java.net.SocketInputStream.read(Unknown Source)
	at sun.security.ssl.InputRecord.readFully(Unknown Source)
	at sun.security.ssl.InputRecord.read(Unknown Source)
	at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
	at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
	at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
	at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.access$200(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.AccessController.doPrivilegedWithCombiner(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
	at com.sun.deploy.net.HttpUtils.followRedirects(Unknown Source)
	at com.sun.deploy.net.BasicHttpRequest.doRequest(Unknown Source)
	at com.sun.deploy.net.BasicHttpRequest.doGetRequestEX(Unknown Source)
	at com.sun.deploy.cache.ResourceProviderImpl.checkUpdateAvailable(Unknown Source)
	at com.sun.deploy.cache.ResourceProviderImpl.isUpdateAvailable(Unknown Source)
	at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source)
	at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source)
	at com.sun.javaws.LaunchDownload$DownloadTask.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

Information to be attached in case you need to open a Support ticket at CloudBees Support

  • Architecture diagram so we can understand how it looks like your environment
  • A support bundle
  • md5sum of agent.jar in both boxes
  • Content of <JENKINS_URL>/computer//jenkins-agent.jnlp
  • Content of <JENKINS_URL>/computer//config.xml
  • Output of commands below launched from agent box
  • The agent and the controller logs which demonstrates that the connectivity is broken
curl -I <JENKINS_URL>/computer/<AGENT>/jenkins-agent.jnlp
curl -I <JENKINS_URL>/tcpSlaveAgentListener/
curl -ILv <JENKINS_URL>

Have more questions?

4 Comments

  • -1
    Avatar
    John Mellor

    Ok, what is a "CJOC_URL"?  Gobbledegook phrase meaning "Canadian Joint Operations Command"?

    If the slave.jar has a different md5, how do I interpret that?  What API versions are compatible, and how do I determine that compatibility?

    I have a JNLP connection issue in K8S, and this document falls far short of what is required to debug this.

  • 1
    Avatar
    Denys Digtiar

    CJOC stands for CloudBees Jenkins Enterprise. For the purposes of this article, it should have just been JENKINS_URL.

    If the hash sum is different it means the versions are different between master and agent. The backward compatibility is maintained but the recommendation is to keep the slave.jar at the same version on both sides. Therefore if md5 is different, replace agent's slave.jar with the one downloaded from the master.

  • 0
    Avatar
    Byron Kim

    In the Load balancer or ha-proxy section, there should be a comment about Idle Timeouts if you're Jenkins node is running through a LB/proxy.  This can cause JNLP connection timeout errors

    The default for ELB for instance is 60s and was causing some builds on slave nodes to fail on certain steps that took a long time to respond.

  • 0
    Avatar
    mark.kenneally Kenneally

    CLI authentication format has changed, it should be:

    java -jar jenkins-cli.jar -s http://<JENKINS_URL>/  -auth <username>:<APItoken> help

Please sign in to leave a comment.