Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka startup fails with zookeeper timeout (remote server), yet the machine can connect to zookeeper directly

WHen I start kafka up, it fails quickly while complaining that it cannot connect to zookeeper. I am running zookeeper as a standalone cluster/ensemble. I am confused because there is no Firewall between the servers (as evidenced by the zookeeper-shell.sh test).

from /var/log/kafka/server.log

2016-02-24 16:07:12,101 INFO kafka.server.KafkaServer: [Kafka Server 1], Connecting to zookeeper on 10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181
2016-02-24 16:07:20,291 FATAL kafka.server.KafkaServerStartable: Fatal error during KafkaServerStable startup. Prepare to shutdown
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000
    at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
    at kafka.server.KafkaServer.initZk(KafkaServer.scala:113)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:69)
    at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
    at kafka.Kafka$.main(Kafka.scala:46)
    at kafka.Kafka.main(Kafka.scala)
2016-02-24 16:07:20,294 INFO kafka.server.KafkaServer: [Kafka Server 1], shutting down
2016-02-24 16:07:20,312 INFO kafka.server.KafkaServer: [Kafka Server 1], shut down completed
2016-02-24 16:07:20,317 INFO kafka.server.KafkaServer: [Kafka Server 1], shutting down

However from the /opt/kafka install directory I am able to connect to zookeeper using the esemble connection string - so I really doubt it is network OR firewall.

[me@dckafka01 kafka]$ cd /opt/kafka
[me@dckafka01 kafka]$ bin/zookeeper-shell.sh 10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181

Connecting to 10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181
Welcome to ZooKeeper!
JLine support is disabled
WATCHER::WatchedEvent state:SyncConnected type:None path:null

get /blah
null
cZxid = 0x400000009
ctime = Tue Feb 16 09:00:28 EST 2016
mZxid = 0x400000009
mtime = Tue Feb 16 09:00:28 EST 2016
pZxid = 0x40000017e
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2

ls /blah
[applications, registry]

Which is as expected. Does anybody have an angle for me to investigate?

like image 934
akaphenom Avatar asked Feb 24 '16 21:02

akaphenom


2 Answers

Well - changing the timeout helped. now i need to chase the network delays down

cat config/server.properties

# coding: UTF-8 
# This file created by Chef from template. Do not hand edit this file

log.dirs=/var/kafka
port=9092
num.partitions=4
default.replication.factor=3
log.flush.interval.messages=1
log.retention.minutes=43200
log.retention.check.interval.ms=3600000
num.replica.fetchers=4
replica.fetch.wait.max.ms=5000
replica.lag.max.messages=10000
auto.leader.rebalance.enable=true
num.network.threads=8
advertised.host.name=10.7.20.71
zookeeper.connection.timeout.ms=16000
broker.id=1
zookeeper.connect=10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181
like image 101
akaphenom Avatar answered Nov 03 '22 12:11

akaphenom


In my case, I just found that my command prompt which was running the zookeepers, kind of hung (usually happens in windows).

I just had to randomly press some key and the cmd was active again. And then running the command gave me no errors.

like image 43
paradocslover Avatar answered Nov 03 '22 11:11

paradocslover