Zookeeper does not start on a controller


  • Zookeeper does not start on a controller
  • We can see the following error in /var/logs/zookeeper/zookeeper.log
2017-11-16 09:13:37,768 - ERROR [main:FileTxnSnapLog@210] - Parent /marathon/leader missing for /marathon/leader/member_0000000096
2017-11-16 09:13:37,769 - ERROR [main:QuorumPeer@453] - Unable to load database on disk
java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /marathon/leader
  • We can see these alerts in CJPOC



We have to check if ZooKeeper is working in that controller to do that we can try to connect to the port of ZooKeeper (2181), also we are going to check Mesos and Marathon services to be sure that the rest controller services are working.

We can check these services from CJEOC by going to Manage Jenkins/Script Console and execute the following script, you have to change the value of the def controllerIP = "" variable to the controller IP.

import java.net.ConnectException;
import java.net.InetSocketAddress;
import java.net.Socket;

def isReachable(ip, port) {
        boolean reachable = true

            Socket socket = new Socket()
            InetSocketAddress addr = new InetSocketAddress(ip, port)
            socket.connect(addr, 300)
        catch (ex) {
            reachable = false

        return reachable

def controllerIP = ""
def zoo = new com.cloudbees.pse.config.metrics.zookeeper.ZooKeeperMetrics()
println "Ping ZooKeeper:"+zoo.ping(controllerIP, 2181)
println "Ping Mesos:"+isReachable(controllerIP, 5050)
println "Ping Marathon:"+isReachable(controllerIP, 8080)

Also we can test the same if we connect to one worker and execute these curl commands, you have to replace CONTROLLER_IP with the controller IP.

dna connect worker-X
curl -Iv CONTROLLER_IP:8080
curl -Iv CONTROLLER_IP:5050
curl -Iv CONTROLLER_IP:2181

If you cannot connect to ZooKeeper in this controller, the zookeeper database went bad on controller. So we had to recreate it, after which all of the alerts got turned off. These were the steps:

  • Connect to the controller - dna connect controller-X
  • Stop the ZooKeeper service - sudo service zookeeper stop
  • Rename zookeeper database - sudo mv /var/lib/zookeeper/version-2 /var/lib/zookeeper/version-2-BAD
  • Start the ZooKeeper service - sudo service zookeeper start

Have more questions?


Please sign in to leave a comment.