Operations Center-Client Master connectivity issues

Symptoms

  • CM appears as disconnected in OC
  • CM shows “Connect to Operations Center” when it should be already connected
  • CM license expires
  • Shared slaves/cloud are not leased to CM

Diagnostic/Treatment

  • Pre-condition: CM and OC were previously correctly connected.

Following sections show specific paths of resolution for CM-OC issues. They are linked to specific stack traces but in some cases, the root cause might be hidden under more general traces (like ... Caused by: ... java.io.IOException: Remotely Closed from the CM logs) which deserves a deeper investigation.

OC-CM connectivity issue at HTTP level

OC logs

  • OC connectivity logs URL: http://oc.jenkins.example.com:8888/job/exampleClientMaster/log
[Mon Oct 24 12:12:09 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 12:12:19 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
java.net.ConnectException: Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
    at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556)
    at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170)
    at java.lang.Thread.run(Thread.java:745)
    at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328)
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
    at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:140)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
    ... 12 more
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    ... 3 more

CM logs

  • CM connectivity log URL: http://cje.jenkins.example.com:8080/operations-center/log
[Mon Oct 24 14:50:28 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 14:50:29 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
java.net.ConnectException: Could not connect to Jenkins server: http://oc.jenkins.example.com:8888/
    at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556)
    at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170)
    at java.lang.Thread.run(Thread.java:745)
    at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328)
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
    at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to http://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
    ... 12 more
Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    ... 3 more
  • A proxy might be configured in CM and OC is not added as No Proxy Host

Ensure that if under Manage Jenkins ??? Manage Plugins ??? Advanced Tab there is a proxy configured then OC hostname is added to No Proxy Host section. i.e OC.jenkins.example.com

  • OC is not reachable from CM at HTTP level

From CM host, try to curl OC instance:

curl -I http://oc.jenkins.example.com:8888/

The header X-Jenkins should appear on the output: X-Jenkins: 2.7.19.0.1 (CloudBees Jenkins Operations Center 2.7.19.0.1-fixed)

If this header does not appear it means OC is not reachable from CM, which means you need to talk to your networking administrator to resolve this issue.

Related KB articles: How to troubleshoot client master connections

OC-CM connectivity issue at TCP level

OC logs

  • OC connectivity logs URL: http://oc.jenkins.example.com:8888/job/exampleClientMaster/log
[Mon Oct 24 12:15:57 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 12:15:57 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ completed
 Agent address: oc.jenkins.example.com/192.168.1.44
 Agent port:  50000
 Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49
[Mon Oct 24 12:15:57 UTC 2016] Trying protocol: OperationsCenter2
[Mon Oct 24 12:15:57 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000

[Mon Oct 24 12:16:07 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49}
java.net.SocketTimeoutException
   at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
   at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117)
   at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140)
   at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194)
   at java.lang.Thread.run(Thread.java:745)
   at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
[Mon Oct 24 12:16:07 UTC 2016] Sleeping for 10s before retrying

CM logs

  • CM connectivity log URL: http://cje.jenkins.example.com:8080/operations-center/log
[Mon Oct 24 14:33:08 UTC 2016] Starting discovery on http://oc.jenkins.example.com:8888/
[Mon Oct 24 14:33:08 UTC 2016] Discovery on http://oc.jenkins.example.com:8888/ completed
Agent address: oc.jenkins.example.com/192.168.1.44
Agent port:  50000
Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49
[Mon Oct 24 14:33:08 UTC 2016] Trying protocol: OperationsCenter2
[Mon Oct 24 14:33:08 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000
[Mon Oct 24 14:33:18 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49}
java.net.SocketTimeoutException
  at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
  at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117)
  at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140)
  at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194)
  at java.lang.Thread.run(Thread.java:745)
  at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)

This means there is an OC-CM connectivity issue at TCP level.

Usually this happens because intermediate elements like haproxy, firewall or ELB are blocking the connection.

CloudBees recommends to use the System Property below in the OC Java Properties to bypass those intermediate elements.

-Dhudson.TcpSlaveAgentListener.hostName=

If you don’t want to perform a restart, after adding the Java Argument you can test it by running TcpSlaveAgentListener.CLI_HOST_NAME="OC_HOSTNAME" in your Script Console.

Related Documentation: Install and Configure CloudBees Jenkins Operations Center and Client Masters

[Tue Feb 14 09:31:36 AEST 2017] Trying protocol: OperationsCenter2
[Tue Feb 14 09:31:36 AEST 2017] Opening TCP socket connection to oc.jenkins.example.com/127.0.0.1 on port 50001
[Tue Feb 14 09:31:46 AEST 2017] Socket connection is closed
[Tue Feb 14 09:31:46 AEST 2017] Connection refused: Connection closed before acknowledgement sent
com.cloudbees.opscenter.agent.protocol.impl.ConnectionRefusalException: Connection closed before acknowledgement sent
	at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.onRecvClosed(AckFilterLayer.java:280)
	at com.cloudbees.opscenter.agent.protocol.FilterLayer.abort(FilterLayer.java:163)
	at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.access$000(AckFilterLayer.java:43)
	at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer$1.run(AckFilterLayer.java:176)
	at com.cloudbees.opscenter.agent.protocol.IOHub$DelayedRunnable.run(IOHub.java:935)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

This issue happens if the JNLP_PORT advertised by OC is incorrect. Most likely because the System Property -Dhudson.TcpSlaveAgentListener.port=<JNLP_PORT> is set OC but the <JNLP_PORT> points to an application that is not Jenkins.

OC-CM connectivity issue at TLS level

Log messages

  • Exception in CM logs:
 nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init>
 WARNING: Pre-validation discovery on https://oc.jenkins.example.com:8888/ failed
 javax.net.ssl.SSLHandshakeException: TLS Handshake exception establishing connection to Jenkins server: https://oc.jenkins.example.com:8888/. You might need to trust server's self-signed certificate on global security configuration.
 ...
 Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
    at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
    at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
    at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
  • Message in the CM user interface

TLSHandshakeException.png

The most common problem related with this issue is that you are using a SSL self signed certificate to publish the OC and CM needs to have installed that certificate in the truststore.

Follow these steps to fix the problem

  1. Download the OC self signed certificate, you have two ways to do it
  • Using the openssl command (change oc.jenkins.example.com:8888 with the host of your OC instance and the ssl port configured, e.g. oc.example.com:443)

    openssl s_client -tls1 -showcerts -connect oc.jenkins.example.com:8888 </dev/null 2>/dev/null|openssl x509 -outform PEM > oc-certificate.pem

  • Downloading directly from your browser (e.g. Chrome)

    • In the address bar, click the little lock with the X. This will bring up a small information screen. Click the button that says “Certificate Information.”
    • Click and drag the image to your desktop and the certificate will be saved on the disk.
  1. Access via SSH to your CM instance and copy the downloaded certificate to a temporal directory
  2. Install the certificate in the java cacert

    keytool -import -alias oc.jenkins.example.com -file oc-certificate.pem -keystore $JAVA_HOME/jre/lib/security/cacert

OC-CM connectivity issue at TLS Hostname verification

Log messages

  • Exception in CM logs:
nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init>
WARNING: Pre-validation discovery on https://oc.local:8443/ failed
javax.net.ssl.SSLException: TLS hostname verification failure establishing connection to Jenkins server: https://oc.local:8443/ Certificate subject: CN=another.local issuer: CN=another.local
	at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:415)
	at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation.<init>(OperationsCenterRegistrar.java:500)
	at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$DescriptorImpl.doPushRegistration(OperationsCenterRegistrar.java:316)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

  • Message in the CM user interface

TLSHostnameVerification.png

This message occurs when you are using in your OC instance a self-signed certificate issued for another host. In the previous case the certificate was issued for another.local and the OC is running on oc.local.

The solution will be create a new self-signed certificate for oc.local and use it to run the OC:

  1. Create a new self-signed certificate

    keytool -genkey -keyalg RSA -alias oc.local -keystore oc.local.jks -storepass jenkins -dname “cn=oc.local”

  2. Run OC using the new self-signed certificate
  3. Install the new certificate in CM, see OC-CM connectivity issue at TLS level

Notes

Operations Center Agent (currently 2.32.0.1 latest at time of writing) does not support TLS SNI

To workaround the problem you could use the certificate CM is expecting as the default one in the OC reverse proxy side. This should work as long as it is the default one.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.