Zookeeper does not start on a controller

Issue

  • Zookeeper does not start on a controller
  • We can see the following error in /var/logs/zookeeper/zookeeper.log

 2017-11-16 09:13:37,768 - ERROR [main:FileTxnSnapLog@210] - Parent /marathon/leader missing for /marathon/leader/member_0000000096
 2017-11-16 09:13:37,769 - ERROR [main:QuorumPeer@453] - Unable to load database on disk
 java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /marathon/leader

* We can see these alerts in CJPOC

Environment

Resolution

We have to check if ZooKeeper is working in that controller to do that we can try to connect to the port of ZooKeeper (2181), also we are going to check Mesos and Marathon services to be sure that the rest controller services are working.

We can check these services from CJEOC by going to Manage Jenkins/Script Console and execute the following script, you have to change the value of the def controllerIP = "192.168.1.10" variable to the controller IP.

import java.net.ConnectException;
import java.net.InetSocketAddress;
import java.net.Socket;

def isReachable(ip, port) {
        boolean reachable = true

        try{
            Socket socket = new Socket()
            InetSocketAddress addr = new InetSocketAddress(ip, port)
            socket.connect(addr, 300)
            socket.close()
        }
        catch (ex) {
            reachable = false
        }

        return reachable
}

def controllerIP = "192.168.1.10"
def zoo = new com.cloudbees.pse.config.metrics.zookeeper.ZooKeeperMetrics()
println "Ping ZooKeeper:"+zoo.ping(controllerIP, 2181)
println "Ping Mesos:"+isReachable(controllerIP, 5050)
println "Ping Marathon:"+isReachable(controllerIP, 8080)

Also we can test the same if we connect to one worker and execute these curl commands, you have to replace CONTROLLER_IP with the controller IP.

dna connect worker-X
curl -Iv CONTROLLER_IP:8080
curl -Iv CONTROLLER_IP:5050
curl -Iv CONTROLLER_IP:2181
exit

If you cannot connect to ZooKeeper in this controller, the zookeeper database went bad on controller. So we had to recreate it, after which all of the alerts got turned off. These were the steps:

  • Connect to the controller - dna connect controller-X
  • Stop the ZooKeeper service - sudo service zookeeper stop
  • Rename zookeeper database - sudo mv /var/lib/zookeeper/version-2 /var/lib/zookeeper/version-2-BAD
  • Start the ZooKeeper service - sudo service zookeeper start
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.