Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zookeeper on same node as kafka?

I am setting up a kafka+zookeeper cluster. Let's say I want 3 kafka brokers. I am wondering if I can setup 3 machines with kafka on them and then run the zookeeper cluster on the same nodes. So each machine has a kafka+zookeeper node in the cluster, instead of having 3 machines for kafka and 3 machines for zookeeper (6 in total).

What are the advantages and disadvantages? These machines will most probably be dedicated to running kafka/zookeeper. I am thinking if I can reduce costs a bit without sacrificing performance.

like image 961
KTrum Avatar asked Jun 24 '17 09:06

KTrum


People also ask

Do we need to install ZooKeeper separately for Kafka?

However, you can install and run Kafka without Zookeeper. In this case, instead of storing all the metadata inside Zookeeper, all the Kafka configuration data will be stored as a separate partition within Kafka itself. In this article, you will learn about Kafka, Zookeeper, and running Apache Kafka without Zookeeper.

Can we have ZooKeeper and broker in same system?

The meaning of "you should dedicate ZooKeeper to Kafka," and "Do not run ZooKeeper on a server where Kafka is running." statements are combined. Zookeeper Servers should not be installed/run on Kafka Broker host.

How ZooKeeper works with Kafka?

At a detailed level, ZooKeeper handles the leadership election of Kafka brokers and manages service discovery as well as cluster topology so each broker knows when brokers have entered or exited the cluster, when a broker dies and who the preferred leader node is for a given topic/partition pair.

How many ZooKeeper nodes does Kafka have?

3 Zookeeper nodes should be enough, although, it's good to understand what are the trade-offs here: ZooKeeper uses majority quorums, which means that every voting that happens in one of these protocols requires a majority to vote on. In a production environment, the ZooKeeper servers will be deployed on multiple nodes.


2 Answers

We have been running zookeeper and kafka broker on the same node in production environment for years without any problems. The cluster is running at very very high qps and IO traffics, so I dare say that our experience suits most scenarios.

The advantage is quite simple, which is saving machines. Kafka brokers are IO-intensive, while zookeeper nodes don't cost too much disk IO as well as CPU. So they won't disturb each other in most occasions.

But do remember to keep watching at your CPU and IO(not only disk but also network) usages, and increase cluster capacity before they reach bottleneck.

I don't see any disadvantages because we have very good cluster capacity planning.

like image 65
Weibo Li Avatar answered Nov 29 '22 19:11

Weibo Li


It makes sense to collocate them when Kafka cluster is small, 3-5 nodes. But keep in mind that it is a colocation of two applications that are sensitive to disk I/O. The workloads and how chatty they are with local Zk's also plays an important role here, especially from page cache memory usage perspective. 

Once Kafka cluster grows to a dozen or more nodes, collocation of Zk’s accordingly on each node will create quorum overheads(like slower writes, more nodes in quorum checks), so a separate Zk cluster has to be in place.

Overall, if from the start Kafka cluster usage is low and you want to save some costs, then it is reasonable to start them collocated, but have a migration strategy for setting up a separate Zk cluster to not be caught of guard once Kafka cluster has to be scaled horizontally. 

like image 25
Alexz Avatar answered Nov 29 '22 19:11

Alexz