If you are facing problems to access a CloudBees Core Operations Center running on modern platforms, we recommend to review
- CloudBees managed master running on Kubernetes reports HTTP ERROR 404 in the browser.
- Unable to access Jenkins master from the browser.
- Jenkins master does not come up after upgrade on Kubernetes.
- The server encountered a temporary error and could not complete your request. Please try again in 30 seconds.
The HTTP 404 Not Found Error means that the web page you were trying to reach could not be found on the server. It is a client-side error which means that the server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
- CloudBees CI (CloudBees Core)
- CloudBees CI (CloudBees Core) on modern cloud platforms - Managed Master
- CloudBees CI (CloudBees Core) on modern cloud platforms - Operations Center
This routing diagnostic guide helps to troubleshoot network issues relating to one of the members of the CloudBees Core on Modern Platforms, which means that you are running CloudBees Core on a Kubernetes cluster. We assume the reader has already followed the CloudBees installation guide, and that every component of the cluster was configured as recommended in the following guides:
- CloudBees Core on modern cloud platforms administration guide
- CloudBees Core Reference Architecture - Kubernetes
For whatever reason, the questioned managed master, say mm1, does not work as expected. Most likely, the issue is related to a Kubernetes network misconfiguration that prevents the master from being provisioned, respond to requests, and being accessed in a browser. The following chart schematically shows the CloudBees Core running on a Kubernetes cluster, an administrative
kubectl agent and a user browser that renders the problem.
Answer the questions below to narrow down the root cause for the outage; it will save you time in fixing it. For the sake of illustration, we use
mm1 managed master in the diagnostic steps.
1.] Can you access the Operations Center (CJOC) from a browser?
kubectl to further troubleshoot the Kubernetes cluster. Validate the status of Operations Center (
cjoc-0) pod. Most likely, the outage is more general and goes beyond a single master.
2.] Can you access the questioned master from a browser; have you tried the incognito mode?
The master is not accessible from any browser.
3.] Is there any other CloudBees managed master that you can access from a browser?
Yes. Try to access the questioned master pod with
kubectl. It is possible that the
JENKINS_HOME folder is accessible. In this case, having an auto-generated support bundles from the
JENKINS_HOME/support folder will help. Continue with this guide to get the explicit steps.
4.] Can you successfully provision a brand new managed master?
Yes. The solution is therefore to identify the broken piece of configuration for the questioned master.
5.] Does the Managed Master Configuration in the Operations Center (CJOC) UI show that the Master is started?
Check to see if the CJOC considers the service to be running by clicking the
Configure option on the Managed Master.
Try to stop and start the managed master and ensure there are no errors in the UI Provisioning log. After a start operation if any errors are returned you will need to select
Acknowledge Error in the lefthand menu of this UI as well to see if connection is restored.
6.] Do the messages below show among the Operations Center (CJOC) while provisioning the master?
Sample error Messages:
[Normal][PersistentVolumeClaim][your_master_pod_name_here][ExternalProvisioning] waiting for a volume to be created, either by external provisioner "example.com/aws-efs" or manually created by system administrator [Warning][Pod][your_master_pod_name_here][FailedScheduling] pod has unbound immediate PersistentVolumeClaims
Yes. This is telling us that the PVC cannot be bound to the underlying PV, hence the pod cannot start. You should contact your Kubernetes support team for a deeper analysis as they will need to review the controller logs.
You are going to run
curl commands to send HTTP requests to the questioned master
curl -I https://example.com/mm1/login | | | | | |__ managed master name, i.e. --prefix value | | | |__ external domain name, the one you use in a browser to access CloudBees Core | |__ http or https
When you send
curl requests to a healthy CloudBees Core master it responds with the
HTTP/1.1 200 OK status.
1.] Check the questioned master with
A healthy CloudBees Core master should be accessible on the network. Open a Linux terminal on any appropriate desktop and run
curl -I http://example.com/mm1/login
The full output should be similar to
HTTP/1.1 200 OK Server: openresty/22.214.171.124 Date: Mon, 23 Sep 2019 06:44:39 GMT Content-Type: text/html;charset=utf-8 Content-Length: 1966 Connection: keep-alive Vary: Accept-Encoding X-Content-Type-Options: nosniff Expires: Thu, 01 Jan 1970 00:00:00 GMT Cache-Control: no-cache,no-store,must-revalidate X-Hudson: 1.395 X-Jenkins: 126.96.36.199 X-Jenkins-Session: 6a97d870 X-Hudson-CLI-Port: 50001 X-Jenkins-CLI-Port: 50001 X-Jenkins-CLI2-Port: 50001 X-Frame-Options: sameorigin X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AM... Set-Cookie: JSESSIONID.6284910f=node0ljc203mrj7m9199phkqz58czk1.node0;Path=/mm1;HttpOnly
If the master responds with HTTP ERROR 404 in the browser, most likely, you will see a similar response to
curl -I http://example.com/mm1/login
output from a non-responding master
HTTP/1.1 404 Not Found Server: openresty/188.8.131.52 Date: Mon, 23 Sep 2019 04:56:50 GMT Content-Type: text/html Content-Length: 159 Connection: keep-alive
The 503 Service Unavailable error is an HTTP status code that means the web server is not available right now. Most likely, the Jenkins instance is restarting, too busy, or is not ready to handle the request.
HTTP/1.1 503 Service Unavailable Server: openresty/184.108.40.206 Date: Mon, 23 Sep 2019 05:03:46 GMT Content-Type: text/html;charset=utf-8 Content-Length: 1056 Connection: keep-alive X-Content-Type-Options: nosniff Expires: 0 Cache-Control: no-cache,no-store,must-revalidate
1.] Check the status of the questioned master pod. Is the pod running?
kubectl get pod mm1-0 -o wide
NAME READY STATUS RESTARTS AGE IP NODE mm1-0 1/1 Running 0 1m 10.52.25.3 gke-cluster-example-core-masters-72f5634a-txbz
2.] Pod events key (last section in the output)
Describe the master pod and review the Events key (last section in the output)
kubectl describe pod mm1-0
3.] Check the location of
The default (CloudBees Core 2.190.x.x) location of the
JENKINS_HOME folder is
kubectl describe pod mm1-0 | grep jenkins_home
/var/jenkins_home from jenkins-home (rw)
4.] Check whether the pod responds to shell commands, and validate that the volumes are mounted
kubectl exec -ti mm1-0 -- df -h
expected output reads as follows
Filesystem Size Used Avail Use% Mounted on overlay 95G 4.6G 90G 5% / tmpfs 64M 0 64M 0% /dev tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup /dev/sda1 95G 4.6G 90G 5% /etc/hosts /dev/sdc 9.8G 639M 8.7G 7% /var/jenkins_home shm 64M 0 64M 0% /dev/shm tmpfs 7.4G 12K 7.4G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 7.4G 0 7.4G 0% /proc/acpi tmpfs 7.4G 0 7.4G 0% /proc/scsi tmpfs 7.4G 0 7.4G 0% /sys/firmware
5.] Verify that the
JENKINS_HOME partition has available disk space
Yes. Jenkins instance has decent amount of available disk space. Access
JENKINS_HOME folder inside the container. List the content of
kubectl exec mm1-0 -- ls /var/jenkins_home
6.] Check read/write access to the
Validate that the persistent volume resource accepts read/write operations
kubectl exec -ti mm1-0 -- bash -c 'echo "OK" > /var/jenkins_home/~writeTest.log \ && cat /var/jenkins_home/~writeTest.log \ && rm /var/jenkins_home/~writeTest.log'
The expected output is
7.] Can you get the Jenkins log?
kubectl logs -f --tail 100 mm1-0
Analyze the output. Does it provide additional clues? Use
Ctrl + c to quit the log mode.
8.] What is the last trace/error in the log?
The log files could vary significantly between instances as different levels of verbosity are allowed. In addition, you may see traces related to master configurations, installed plugins, and bootstrap scripts. The following traces are the most critical to assess healthiness of the provisioned managed master.
[Mon Sep 23 06:40:36 GMT 2019] Requested provisioning successfully. [Mon Sep 23 06:40:38 GMT 2019] Requested start successfully [Mon Sep 23 07:41:38 GMT 2019][Normal][Pod][mm1-0][Pulled] Successfully pulled image "cloudbees/cloudbees-core-mm:220.127.116.11" [Mon Sep 23 07:41:38 GMT 2019][Normal][Pod][mm1-0][Started] Started container jenkins [Mon Sep 23 06:43:02 GMT 2019] Accepting initial connection from http://example.com/mm1/ on 10.52.25.3/10.52.25.3:39202 with identity f1:8d:d6:f6:5e:ed:fe:25:17:38:12:c0:cb:ce:a1:d6 (STORED) [Mon Sep 23 06:43:14 GMT 2019] Connected [Mon Sep 23 09:01:55 GMT 2019] Checking license validity... [Mon Sep 23 09:01:55 GMT 2019] License will expire in 2 days 22 hr (not before next check) [Mon Sep 23 09:01:55 GMT 2019] Current license is valid
9.] Does Jenkins log update in real time?
Yes. Analyze the output.
10.] Access (get a shell to a running container) the questioned master pod.
kubectl exec -ti mm1-0 -- bash
To check whether Jenkins instance is started, i.e.
java process is running, run
ps -eaf | grep java
jenkins 7 1 23 06:41 ? 00:03:05 java -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Xbootclasspath/p:/usr/share/jenkins/alpn-boot.jar -Duser.home=/var/jenkins_home -Xmx2150m -Xms2150m -Djenkins.model.Jenkins.slaveAgentPort=50001 -Djenkins.install.runSetupWizard=true -Dhudson.lifecycle=hudson.lifecycle.ExitLifecycle -Duser.timezone=PST -DMASTER_NAME=mm1 -Dcb.BeekeeperProp.autoInstallIncremental=true -Djenkins.model.Jenkins.slaveAgentPortEnforce=true -DMASTER_GRANT_ID=0db2670e-c797-48b3-9042-e997207bf6be -Dcb.IMProp.warProfiles.cje=kubernetes.json -DMASTER_INDEX=0 -DMASTER_OPERATIONSCENTER_ENDPOINT=http://cjoc.example.svc.cluster.local/cjoc/ -Dcb.BeekeeperProp.noFullUpgrade=true -Dhudson.DNSMultiCast.disabled=true -DMASTER_ENDPOINT=http://example.com/mm1/ -XX:NativeMemoryTracking=summary -jar -Dcb.distributable.name=Docker Common CJE -Dcb.distributable.commit_sha=49e35b48176fc789078f52e12f8fb09382da938a /usr/share/jenkins/jenkins.war --webroot=/tmp/jenkins/war --pluginroot=/tmp/jenkins/plugins --prefix=/mm1/ jenkins 1334 1311 0 06:54 pts/0 00:00:00 grep java
1.] Start by identifying the internal IP address and ports of the questioned pod
kubectl describe pod mm1-0 | grep -E "IP|Ports"
IP: 10.52.25.3 Ports: 8080/TCP, 50001/TCP Host Ports: 0/TCP, 0/TCP
2.] Check network statistics for the questioned pod
kubectl exec -ti mm1-0 -- netstat -tupe
expected output shows established connections
Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name tcp 0 0 mm1-0.mm1.cje-sup:39202 cjoc.cje-support-:50000 ESTABLISHED jenkins 97862 7/java tcp 0 0 mm1-0.mm1.cje-sup:45014 kubernetes.default.:443 ESTABLISHED jenkins 97824 7/java tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.25.1:35292 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.25.1:35248 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-sup:45018 kubernetes.default.:443 ESTABLISHED jenkins 99765 7/java tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.21.8:49890 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.21.8:49876 ESTABLISHED jenkins 117321 7/java tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.25.1:35272 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.25.1:35264 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.25.1:35256 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.21.8:49892 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.21.8:49880 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.21.8:49874 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.25.1:35280 TIME_WAIT root 0 - tcp 0 0 mm1-0.mm1.cje-sup:44990 kubernetes.default.:443 ESTABLISHED jenkins 99734 7/java tcp 0 0 mm1-0.mm1.cje-supp:8080 10.52.21.8:49888 TIME_WAIT root 0 -
1.] Validate the configured ports are open on the questioned master.
netcat tool on
cjoc-0 pod to probe open ports of the questioned master pod
kubectl exec -ti cjoc-0 -- nc -zv 10.52.25.3 8080 kubectl exec -ti cjoc-0 -- nc -zv 10.52.25.3 50001
10.52.25.3 (10.52.25.3:8080) open 10.52.25.3 (10.52.25.3:50001) open
1.] Check the Service that defines access to the questioned master pod. You should see Service of type
kubectl get services mm1 -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR mm1 ClusterIP 10.55.249.188 <none> 80/TCP,50001/TCP 41m com.example.cje.tenant=mm1
2.] Describe the service and validate that the TargetPorts are set correctly
kubectl describe service mm1 | grep -E "IP|Port"
Type: ClusterIP IP: 10.55.249.188 Port: http 80/TCP TargetPort: 8080/TCP Port: agent 50001/TCP TargetPort: 50001/TCP
3.] Validate the service object for the questioned master
Validate that the
TargetPort are open
kubectl exec -ti cjoc-0 -- nc -zv mm1 80 kubectl exec -ti cjoc-0 -- nc -zv mm1 50001
mm1 (10.55.249.188:80) open mm1 (10.55.249.188:50001) open
4.] Describe the ingress object for the questioned master
Describe the ingress object and review
kubectl describe ing mm1
Name: mm1 Namespace: cje-example Address: 18.104.22.168 Default backend: default-http-backend:80 (<none>) Rules: Host Path Backends ---- ---- -------- example.com /mm1/ mm1:80 (10.52.31.29:8080) Annotations: ingress.kubernetes.io/proxy-body-size: 50m ingress.kubernetes.io/proxy-request-buffering: off ingress.kubernetes.io/ssl-redirect: true kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/proxy-body-size: 50m nginx.ingress.kubernetes.io/proxy-request-buffering: off nginx.ingress.kubernetes.io/ssl-redirect: true Events: <none>
and validate that the master pod can be reached from the CJOC pod
kubectl exec -ti cjoc-0 -- curl -I 10.52.31.29:8080/mm1/login | head -n1 kubectl exec -ti cjoc-0 -- curl -I example.com/mm1/login | head -n1
The output should be
HTTP/1.1 200 OK.
If a master is expected to have the Internet access, then validate it can reach any external URL
kubectl exec -ti mm1-0 -- nc -zv www.google.com 443
expected output is similar to
www.google.com (22.214.171.124:443) open
Follow the troubleshooting page
1.] Utility pod
To further troubleshoot the routing issues, we need tools such as
dnsutils. You are going to create a diagnostic pod with the listed packages installed. On a Debian based test pod, run
apt-get update && apt-get install iputils-ping iproute2 curl telnet netcat net-tools -y
Create a new
kubectl create deployment utility-pod --image=nginx kubectl get pods | grep utility-pod
utility-pod-7b45c4f7dd-dj8gv 1/1 Running 0 85s
Open an interactive session on a
utility-pod and install the tools.
kubectl exec -ti utility-pod-7b45c4f7dd-dj8gv -- bash apt-get update && apt-get install iputils-ping iproute2 curl telnet netcat net-tools -y
Good troubleshooting guides include
Note: Once the issue is resolved, remove the
kubectl delete deploy utility-pod
2.] Involve k8s support team
Contact your corporate k8s support team for further troubleshooting.
Submit a CloudBees Support request and a CloudBees engineer will schedule a call with you.