Symptoms
- I am not able to connect a JNLP agent to a Jenkins Instance
- The build has failed because the connection got broken
- The build is stalled in the queue waiting for the agent
- The agent is disconnected and cannot connect again
- Channel is broken warning at logs
- Any of the exceptions listed below
Diagnosis/Treatment
There are some required data before starting to diagnose the issue that needs to be provided:
- Required Data: JNLP dedicated agents
- Last successful (agent was connected) and failed build (agent was disconnected) folders as explained Required Data: An issue with a Build of a Job
1. Requirements
1.A Ensure that the Java version is at least on the same line on both master and agent
A good practice is to run the exactly same Java version in both Jenkins and agent, but when this is not possible it is mandatory to be running at least the same base line (major version coordinate). Check Supported JDK for CloudBees Core.
Run java -version
in both Jenkins master box and agent to check the java version you are running in both.
1.B Ensure that the version of agent.jar matches with the one
The main problem of running JNLP as an agent Launcher is that when you upgrade Jenkins agent.jar
is not automatically upgraded on the agent it happens in SSH Launcher out of the box. It can be solved in Windows by using JNLP + winsw adding the Remoting executable in <download from="${JENKINS_URL}/jnlpJars/agent.jar" to="%BASE%\agent.jar"/>
.
Check that agent.jar
is the same using for example md5sum agent.jar
. agent.jar
can be downloaded from Jenkins master from the URL below:
http://<JENKINS_URL>/jnlpJars/agent.jar
Please refer to Remoting Best Practices – Agent Daemonization
Partial solutions:
- Using the Versions Node Monitors Plugin
- Share
agent.jar
via NFS
1.C Connectivities checks
Use jenkins-cli to check the connection
In the agent box, download the CLI and run a help command in your favorite mode. For example, using http
mode:
java -jar jenkins-cli.jar [-s $JENKINS_URL] -auth <user>:<token> help
Check that the agent is able to see the JENKINS headers
# curl -IvL <JENKINS_URL>
curl -IvL https://jenkins:8443
For Windows, curl
command can be available on a Windows box using for example curl Download Wizard or cwyng.
Check that the JNLP port is accesible to the agent
# telnet <JENKINS_HOST> <JNLP_PORT>
telnet jenkins.host.example.com 50234
2. Use a different Launch mechanism
For Jenkins >= 2.204.1 LTS, switch to a different Launch mechanism: Connect directly to TCP port.
3. Known issues
3.A. Unable to load class once the loading was interrupted
JENKINS-36991 Unable to load class once the loading was interrupted is resolved and Released in remoting 2.61.
To confirm what remoting version your agent.jar
(formerly slave.jar
) file is currently tied to, run the following command in the same directory as your .jar
file and check the parameter REMOTING_VERSION
in output:
jar xf agent.jar META-INF/MANIFEST.MF
more META-INF/MANIFEST.MF
Jenkins log / Build console output log
java.lang.NoClassDefFoundError: Could not initialize class jenkins.model.Jenkins
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at Script1.class$(Script1.groovy)
at Script1.$get$$class$jenkins$model$Jenkins(Script1.groovy)
at Script1.run(Script1.groovy:1)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:580)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:618)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589)
at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142)
at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:121)
at hudson.remoting.UserRequest.perform(UserRequest.java:49)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Agent log
Slave.jar version: 2.52
This is a Unix slave
Evacuated stdout
Slave successfully connected and online
Jul 27, 2016 8:36:57 AM jenkins.model.Jenkins <clinit>
SEVERE: Failed to load Jenkins.class
hudson.remoting.RemotingSystemException: java.lang.InterruptedException
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:266)
at com.sun.proxy.$Proxy5.fetch3(Unknown Source)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.thoughtworks.xstream.XStream.buildMapper(XStream.java:590)
at com.thoughtworks.xstream.XStream.<init>(XStream.java:568)
at com.thoughtworks.xstream.XStream.<init>(XStream.java:496)
at com.thoughtworks.xstream.XStream.<init>(XStream.java:465)
at com.thoughtworks.xstream.XStream.<init>(XStream.java:411)
at com.thoughtworks.xstream.XStream.<init>(XStream.java:350)
at hudson.util.XStream2.<init>(XStream2.java:88)
at jenkins.model.Jenkins.<clinit>(Jenkins.java:4217)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at Script1.class$(Script1.groovy)
at Script1.$get$$class$jenkins$model$Jenkins(Script1.groovy)
at Script1.run(Script1.groovy:1)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:580)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:618)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589)
at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142)
at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:121)
at hudson.remoting.UserRequest.perform(UserRequest.java:49)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.Request.call(Request.java:147)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:253)
... 30 more
3.B. Intermittent Invalid Object ID in remoting module
JENKINS-23271 Intermittent Invalid Object ID in remoting module
It’s fixed and released on Jenkins core higher than 2.32
Happens frequently on Java 8 due its object management logic.
Causes issues in task execution (build failures, agent disconnects)
Jenkins log / Build console output log
FATAL: Invalid object ID 18649 iuota=18470
java.lang.IllegalStateException: Invalid object ID 18469 iota=18470
at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:277)
3.C. Ping Thread
Check the Ping Thread Documentation here.
PingThread checks that agent is ABLE to execute a command from master (NOOP request)
Ping command may fail to execute:
- Overloaded queue, all agent workers are busy → On big boxes you can increase the number of remoting TaskPool workers
- Network overloaded
In some cases disabling can help
So, if this is the stacktrace you are seeing all the time, you should then disable the PingThread. The side effect is just that the agent is suppose to hung in case the communication is failing between master and agents. The good side is that you will be able to get a thread dump on both sides master and agent.
Jenkins log / Build console output log
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1163)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118)
at hudson.remoting.PingThread.ping(PingThread.java:126)
at hudson.remoting.PingThread.run(PingThread.java:85)
Caused by: java.util.concurrent.TimeoutException: Ping started at 1474633728617 hasn't completed by 1474633968617
... 2 more
3.D. JNLP Cloud Agents are disconnected on start process
It affects Jenkins core higher than 2.28
Relax requirements of the JNLP connection receiver, which was rejections connections from agents not using JNLPComputerLauncher (e.g. from Agent Setup, vSphere Cloud and other plugins). No the connection is accepted from launchers implementing other proxying and filtering Launcher implementations. Particular plugins may require setting up the -Djenkins.slaves.DefaultJnlpSlaveReceiver.disableStrictVerification=true
system property in the master JVM to allow connecting agents. JENKINS-39232, regression in 2.28
4. HA / LB / Reverse proxy bypass
- It’s highly recommended adding
-Dhudson.TcpSlaveAgentListener.hostName=$MASTER_IP
Java properties on master. In such a case, the connection goes directly to instance w/o passing through HAproxy/Load balancer/Reverse proxy. See JNLP connectivity Best Practices
5. Clear the Java Web Start Cache
If, when starting the JNLP file, you see an error like the one below, run the command javaws -clearcache
to clear the cache of the java webstart program.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.access$200(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessController.doPrivilegedWithCombiner(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at com.sun.deploy.net.HttpUtils.followRedirects(Unknown Source)
at com.sun.deploy.net.BasicHttpRequest.doRequest(Unknown Source)
at com.sun.deploy.net.BasicHttpRequest.doGetRequestEX(Unknown Source)
at com.sun.deploy.cache.ResourceProviderImpl.checkUpdateAvailable(Unknown Source)
at com.sun.deploy.cache.ResourceProviderImpl.isUpdateAvailable(Unknown Source)
at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source)
at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source)
at com.sun.javaws.LaunchDownload$DownloadTask.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
6. JNLP windows agent
Runaway agent process
- In particular cases
jenkins-agent.exe
gets forcibly terminated (user action, fatal remoting failure, windows service hardstop) Java.exe
running agent may be leaked- It causes multiple “slave is already connected” messages in the Jenkins log
7. TCP retransmission timeout OSS - perhaps increase
7.A Linux
sysctl -w net.ipv4.tcp_keepalive_time=120
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=8
sysctl -w net.ipv4.tcp_fin_timeout=30
7.B Windows
Things that you may want to know about TCP Keepalives
Avoiding TCP/IP Port Exhaustion
KeepAliveInterval = 30000
KeepAliveTime = 120000
TcpMaxDataRetransmissions = 8
TcpTimedWaitDelay=30
KeepAliveInterval
KeepAliveTime
TcpMaxDataRetransmissions
TcpTimedWaitDelay
7.C Mac
Using TCP keepalive to Detect Network Errors
net.inet.tcp.keepidle=120000
net.inet.tcp.keepintvl=30000
net.inet.tcp.keepcnt=8
Note: remoting 2.62.1 has an improvement wrt to keepalive from the client (agent) side
8. When all fails
- Try to add this Java property on master
-Djenkins.slaves.NioChannelSelector.disabled=true
- Still I/O available and it complicates and improve the performance
- Try to add this Java property on master
-Djenkins.slaves.JnlpSlaveAgentProtocol3.enabled=false
2 Comments