Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is difference between partition and replica of a topic in kafka cluster

What is difference between partition and replica of a topic in kafka cluster. I mean both store the copies of messages in a topic. Then what is the real diffrence?

like image 943
Gaurav Khare Avatar asked Nov 26 '14 13:11

Gaurav Khare


People also ask

What is the difference between partition and replica in Kafka?

In Kafka, replication happens at the partition level i.e. copies of the partition are maintained at multiple broker instances. When we say a topic has a replication factor of 3, this means we will be having three copies of each of its partitions.

What is the difference between replication and partitioning?

Replication: Keep a copy of the same data on several different nodes. Partitioning: Split the database into smaller subsets and distributed the partitions to different nodes. Transactions: Mechanisms to ensure that data is kept consistent in the database.

What are partitions and replication factor in Kafka?

Every partition in a Kafka topic has a write-ahead log where the messages are stored and every message has a unique offset that identifies its position in the partition's log. Every topic partition in Kafka is replicated n times, where n is the replication factor of the topic.

What are replicas in Kafka?

In Kafka parlance, Kafka Replication means having multiple copies of the data, spread across multiple servers/brokers. This helps in maintaining high availability in case one of the brokers goes down and is unavailable to serve the requests.


2 Answers

When you add the message to the topic, you call send(KeyedMessage message) method of the producer API. This means that your message contains key and value. When you create a topic, you specify the number of partitions you want it to have. When you call "send" method for this topic, the data would be sent to only ONE specific partition based on the hash value of your key (by default). Each partition may have a replica, which means that both partitions and its replicas store the same data. The limitation is that both your producer and consumer work only with the main replica and its copies are used only for redundancy.

Refer to the documentation: http://kafka.apache.org/documentation.html#producerapi And a basic training: http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign

like image 185
0x0FFF Avatar answered Sep 30 '22 02:09

0x0FFF


Topics are partitioned across multiple nodes so a topic can grow beyond the limits of a node. Partitions are replicated for fault tolerance. Replication and leader takeover is one of the biggest difference between Kafka and other brokers/Flume. From the Apache Kafka site:

Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

like image 41
techuser soma Avatar answered Sep 30 '22 02:09

techuser soma