Symptoms
- CM appears as disconnected in OC
- CM shows “Connect to Operations Center” when it should be already connected
- CM license expires
- Shared slaves/cloud are not leased to CM
Diagnostic/Treatment
- Pre-condition: CM and OC were previously correctly connected.
Following sections show specific paths of resolution for CM-OC issues. They are linked to specific stack traces but in some cases, the root cause might be hidden under more general traces (like ... Caused by: ... java.io.IOException: Remotely Closed
from the CM logs) which deserves a deeper investigation.
OC-CM connectivity issue at HTTP level
OC logs
- OC connectivity logs URL:
http://oc.jenkins.example.com:8888/job/exampleClientMaster/log
[Mon Oct 24 12:12:09 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 12:12:19 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
java.net.ConnectException: Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170)
at java.lang.Thread.run(Thread.java:745)
at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328)
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:140)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
... 12 more
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888
at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
... 3 more
CM logs
- CM connectivity log URL:
http://cje.jenkins.example.com:8080/operations-center/log
[Mon Oct 24 14:50:28 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 14:50:29 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
java.net.ConnectException: Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170)
at java.lang.Thread.run(Thread.java:745)
at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328)
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
... 12 more
Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
... 3 more
- A proxy might be configured in CM and OC is not added as No Proxy Host
Ensure that if under Manage Jenkins → Manage Plugins → Advanced Tab there is a proxy configured then OC hostname is added to No Proxy Host section. i.e OC.jenkins.example.com
- OC is not reachable from CM at HTTP level
From CM host, try to curl OC instance:
The header X-Jenkins should appear on the output: X-Jenkins: 2.7.19.0.1 (CloudBees Jenkins Operations Center 2.7.19.0.1-fixed)
If this header does not appear it means OC is not reachable from CM, which means you need to talk to your networking administrator to resolve this issue.
Related KB articles: How to troubleshoot client master connections
OC-CM connectivity issue at TCP level
OC logs
- OC connectivity logs URL:
http://oc.jenkins.example.com:8888/job/exampleClientMaster/log
[Mon Oct 24 12:15:57 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 12:15:57 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ completed
Agent address: oc.jenkins.example.com/192.168.1.44
Agent port: 50000
Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49
[Mon Oct 24 12:15:57 UTC 2016] Trying protocol: OperationsCenter2
[Mon Oct 24 12:15:57 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000
[Mon Oct 24 12:16:07 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49}
java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194)
at java.lang.Thread.run(Thread.java:745)
at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
[Mon Oct 24 12:16:07 UTC 2016] Sleeping for 10s before retrying
CM logs
- CM connectivity log URL:
http://cje.jenkins.example.com:8080/operations-center/log
[Mon Oct 24 14:33:08 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 14:33:08 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ completed
Agent address: oc.jenkins.example.com/192.168.1.44
Agent port: 50000
Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49
[Mon Oct 24 14:33:08 UTC 2016] Trying protocol: OperationsCenter2
[Mon Oct 24 14:33:08 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000
[Mon Oct 24 14:33:18 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49}
java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140)
at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194)
at java.lang.Thread.run(Thread.java:745)
at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
This means there is an OC-CM connectivity issue at TCP level.
Usually this happens because intermediate elements like haproxy, firewall or ELB are blocking the connection.
CloudBees recommends to use the System Property below in the OC Java Properties to bypass those intermediate elements.
-Dhudson.TcpSlaveAgentListener.hostName=<MACHINE_HOSTNAME>
If you don’t want to perform a restart, after adding the Java Argument you can test it by running TcpSlaveAgentListener.CLI_HOST_NAME="OC_HOSTNAME"
in your Script Console.
Related Documentation: Install and Configure CloudBees Jenkins Operations Center and Client Masters
[Tue Feb 14 09:31:36 AEST 2017] Trying protocol: OperationsCenter2
[Tue Feb 14 09:31:36 AEST 2017] Opening TCP socket connection to oc.jenkins.example.com/127.0.0.1 on port 50001
[Tue Feb 14 09:31:46 AEST 2017] Socket connection is closed
[Tue Feb 14 09:31:46 AEST 2017] Connection refused: Connection closed before acknowledgement sent
com.cloudbees.opscenter.agent.protocol.impl.ConnectionRefusalException: Connection closed before acknowledgement sent
at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.onRecvClosed(AckFilterLayer.java:280)
at com.cloudbees.opscenter.agent.protocol.FilterLayer.abort(FilterLayer.java:163)
at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.access$000(AckFilterLayer.java:43)
at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer$1.run(AckFilterLayer.java:176)
at com.cloudbees.opscenter.agent.protocol.IOHub$DelayedRunnable.run(IOHub.java:935)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This issue happens if the JNLP_PORT advertised by OC is incorrect. Most likely because the System Property -Dhudson.TcpSlaveAgentListener.port=<JNLP_PORT>
is set OC but the <JNLP_PORT>
points to an application that is not Jenkins.
OC-CM connectivity issue at TLS level
Log messages
- Exception in CM logs:
nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init>
WARNING: Pre-validation discovery on https://oc.jenkins.example.com:8888/ failed
javax.net.ssl.SSLHandshakeException: TLS Handshake exception establishing connection to Jenkins server: https://oc.jenkins.example.com:8888/. You might need to trust server's self-signed certificate on global security configuration.
...
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
- Message in the CM user interface
The most common problem related with this issue is that you are using a SSL self signed certificate to publish the OC and CM needs to have installed that certificate in the truststore.
Follow these steps to fix the problem
-
Download the OC self signed certificate, you have two ways to do it
- Using the openssl command (change
oc.jenkins.example.com:8888
with the host of your OC instance and the ssl port configured, e.g.oc.example.com:443
)
openssl s_client -tls1 -showcerts -connect oc.jenkins.example.com:8888 </dev/null 2>/dev/null|openssl x509 -outform PEM > oc-certificate.pem
- Downloading directly from your browser (e.g. Chrome):
- In the address bar, click the little lock with the X. This will bring up a small information screen. Click the button that says “Certificate Information.”
- Click and drag the image to your desktop and the certificate will be saved on the disk.
- Using the openssl command (change
- Access via SSH to your CM instance and copy the downloaded certificate to a temporal directory
- Install the certificate in the java cacert
keytool -import -alias oc.jenkins.example.com -file oc-certificate.pem -keystore $JAVA_HOME/jre/lib/security/cacert
OC-CM connectivity issue at TLS Hostname verification
Log messages
- Exception in CM logs:
nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init>
WARNING: Pre-validation discovery on https://oc.local:8443/ failed
javax.net.ssl.SSLException: TLS hostname verification failure establishing connection to Jenkins server: https://oc.local:8443/ Certificate subject: CN=another.local issuer: CN=another.local
at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:415)
at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation.<init>(OperationsCenterRegistrar.java:500)
at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$DescriptorImpl.doPushRegistration(OperationsCenterRegistrar.java:316)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- Message in the CM user interface
This message occurs when you are using in your OC instance a self-signed certificate issued for another host. In the previous case the certificate was issued for another.local
and the OC is running on oc.local
.
The solution will be create a new self-signed certificate for oc.local
and use it to run the OC:
-
Create a new self-signed certificate
keytool -genkey -keyalg RSA -alias oc.local -keystore oc.local.jks -storepass jenkins -dname “cn=oc.local”
-
Run OC using the new self-signed certificate
-
Install the new certificate in CM, see
OC-CM connectivity issue at TLS level
Notes
Operations Center Agent (currently 2.32.0.1 latest at time of writing) does not support TLS SNI
To workaround the problem you could use the certificate CM is expecting as the default one in the OC reverse proxy side. This should work as long as it is the default one.
0 Comments