HA Cluster doesn't start when messaging database is corrupted

Issue

In a HA configuration, you cannot access to UI, showing this error:

Error

Jenkins detected that you appear running more than one instance of Jenkins that share the same home directory '/path/to/jenkins_home'. This is greatly confuses Jenkins and you will likely experience strange behaviors, so please correct the situation.

This Jenkins: 40843708317 contextPath:"/jenkins" 32323@hostname_node1
Other Jenkins: 138947981375 contextPath:"/jenkins" 4223@hostname_node2

Ignore this problem and keep using Jenkins anyway

Also you can see this stacktrace during the start process in the logs:

2016-08-24 11:06:12.763-0400 [id=53]	SEVERE	jenkins.InitReactorRunner$1#onTaskFailed: Failed Messaging.afterExtensionsAugmentedjava.lang.Error: java.lang.reflect.InvocationTargetException
	at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:110)
	at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:176)
	at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
	at jenkins.model.Jenkins$8.runTask(Jenkins.java:926)
	at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210)
	at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:106)
	... 8 more
Caused by: java.io.IOError: java.io.EOFException
	at org.mapdb.Volume$FileChannelVol.getDataInput(Volume.java:1011)
	at org.mapdb.Volume$FileChannelVol.getDataInput(Volume.java:781)
	at org.mapdb.StoreDirect.get2(StoreDirect.java:469)
	at org.mapdb.StoreWAL.get2(StoreWAL.java:336)
	at org.mapdb.StoreWAL.get(StoreWAL.java:320)
	at org.mapdb.Caches$HashTable.get(Caches.java:246)
	at org.mapdb.EngineWrapper.get(EngineWrapper.java:58)
	at org.mapdb.BTreeMap.<init>(BTreeMap.java:541)
	at org.mapdb.DB.getTreeMap(DB.java:805)
	at com.cloudbees.opscenter.context.Messaging$Local.open(Messaging.java:611)
	at com.cloudbees.opscenter.context.Messaging$Local.access$400(Messaging.java:541)
	at com.cloudbees.opscenter.context.Messaging.open(Messaging.java:484)
	at com.cloudbees.opscenter.context.Messaging.afterExtensionsAugmented(Messaging.java:59)
	... 13 more
Caused by: java.io.EOFException
	at org.mapdb.Volume$FileChannelVol.readFully(Volume.java:947)
	at org.mapdb.Volume$FileChannelVol.getDataInput(Volume.java:1008)
	... 25 more

Environment

  • CloudBees Jenkins Enterprise
  • Operations Center Context Plugin

Resolution

The messaging databases got corrupted, avoiding the cluster restart, and cannot be recreated automatically.

  • Stop all HA nodes.
  • Backup and remove the files $JENKINS_HOME/messaging, $JENKINS_HOME/messaging.p and $JENKINS_HOME/messaging.t
  • Start the cluster.

Resulting Issue that needs to be fixed

Because we are deleting this messaging database, the messaging from the Operations Center and the Master will be out of synch. We need to make sure to correct that issue.

  • Run this script on Manage Jenkins> Script console. This script will print out a list of connected masters and their instanceID.
  • Find the instance ID of the master you just removed the database file for and then look for that instance id in the section beginning with maxPulls:
  • Record the number which comes up here: - $INSTANCE_ID: $NUMBER
  • Run this script on the Manage Jenkins> Script console of the master:
    ```
    import com.cloudbees.opscenter.context.Messaging;

println Messaging.getInstance().local.outboxSequenceId
Messaging.getInstance().local.outboxSequenceId.set($NUMBER+1);
println Messaging.getInstance().local.outboxSequenceId
```
and set the $NUMBER value to the one found from the CJOC script above.

This will get your master outboxSequenceId back to what it was before removing the database file.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.