Agent and Controller provisioning fails after Upgrading Kubernetes or CloudBees CI on Modern Platforms or JDK

Issue

  • Agent / Controller provisioning fails and Jenkins logs show a stacktrace similar to the following:
okhttp3.internal.http2.ConnectionShutdownException
   at okhttp3.internal.http2.Http2Connection.newStream(Http2Connection.java:219)
   at okhttp3.internal.http2.Http2Connection.newStream(Http2Connection.java:205)
   [...]
   at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404)

Or the following:

okhttp3.internal.http2.StreamResetException: stream was reset: PROTOCOL_ERROR
	at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.java:158)
	at okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:131)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
  [...]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)

Environment

Related Issue

Explanation

This is caused by Java - more precisely the okhttp library used by the kubernetes client - that chooses the wrong protocol HTTP/2 to communicate with the Kubernetes API Server although it does not support it. There a couple of changes meant to bring support for HTTP/2 to Java 8 and Java clients that could explain this problem. And also a change in Kubernetes:

  • In CloudBees CI 2.176.4.3, the alpn-boot.jar was added to the classpath in CloudBees images to fix CPLT2-5621. That brings support for HTTP/2 and cause okhttp to use HTTP/2 in some circumstances.
  • In Kubernetes 1.17 and later, okhttp seems to wrongly choose the HTTP/2 protocol to communicate with Kubernetes but the kube-apiserver does not support it. Our experience is that this happens also with Openshift 4.2 and later.

Running version 2.176.4.3 of CloudBees CI in Kubernetes version older than 1.17 or Openshift version older than 4.2 is very risky. If Kubernetes/Openshift is upgraded, controller and agent provisioning would likely stop working and there is no workaround possible other than upgrading CloudBees CI. In public Cloud, clusters might be automatically upgraded by the Cloud operator (GKE, EKS, AKS, …).

Some solutions emerged to workaround that problem:

  • In kubernetes-client 4.4.0, there is a system property http2.disable that can be used to disable HTTP/2 and can workaround the problem.
  • In CloudBees CI 2.190.2.2, kubernetes-client 4.4.0 is available and the system property http2.disable can be used
  • In CloudBees CI 2.190.3.2, the alpn-boot.jar was removed from the classpath in CloudBees images to fix the problem CPLT2-6044. The system property http2.disable should not be needed as this should prevent okhttp from choosing HTTP/2 if it is not supported.

Kubernetes Client maintainers addressed the problem directly as it is discovered that the same behavior is caused by JDK 8u252 that brings support for HTTP/2 with JEP-244:

  • In Kubernetes Client 4.9.2, kubernetes client forces http1.1 to avoid the problem caused by JDK 8u252 fabric8/kubernetes-client #2212
  • In CloudBees CI 2.235.1.2, JDK 8u252 is installed but kubernetes-client 4.9.2 is available which should definitely fix those problems.

Resolution

The recommended solution is to upgrade CloudBees CI to version 2.235.1.2 or later. That version guarantee that the kubernetes client uses http1.1 when communicating with Kubernetes and prevent that

Workaround

CloudBees CI >= 2.190.2.2

If impacted, add the System Property http2.disable=true to the startup of Operations Center and Managed Controllers. See How to add Java arguments to Jenkins on CloudBees CI Modern ? for details.

CloudBees CI <= 2.176.4.3

If impacted, there is no workaround. CloudBees CI must be upgraded to a version that has a workaround (2.190.2.2 or later) or a version that contains the solution (2.235.1.2 or later).

Have more questions?

0 Comments

Please sign in to leave a comment.