Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the maximum replication factor for a partition of kafka topic

I hava kafka cluster having 3 brokers and a couple of topics with each having 5 partitions. Now i want to set the replication factor for the partitions.

What is the maximum replication factor which i can set for a partition of kafka topic?

like image 278
KayV Avatar asked Nov 11 '19 18:11

KayV


People also ask

What should be the maximum replication factor?

The factors to consider while choosing replication factor are: It should be at least 2 and a maximum of 4. The recommended number is 3 as it provides the right balance between performance and fault tolerance, and usually cloud providers provide 3 data centers / availability zones to deploy to as part of a region.

What is partition and replication factor in Kafka?

Every topic partition in Kafka is replicated 'n' number of times ( where n is the replication factor defined by the user) which means that n copies of that partition would be present at the different brokers in the cluster.

How many replication factors are there in Kafka?

A replication factor is the number of copies of data over multiple brokers. The replication factor value should be greater than 1 always (between 2 or 3). This helps to store a replica of the data in another broker from where the user can access it.

What is the maximum number of partitions Kafka?

In Kafka, a topic can have multiple partitions to which records are distributed.


2 Answers

Replication factor determines the number of replications each partition have, this allows Kafka to automatically failover to these replicas when a server in the cluster fails so that messages remain available in case of failures

Partition replicas are distributed across brokers and one broker should keep one replica that means we can't have more replicas than the number of brokers

Max Replication factor <= brokers number.

This is also meant to determine min.insync.replicas, that means it will always be less than or equal to replication-factor

min.insync.replicas means <= Replication factor

min.insync.replicas is the minimum number of copies of the data that you are willing to be online at any time to continue running and accepting new incoming messages.

Ideally replication factor 3 is good as mentioned above, however, based on the use case you can tune replication factor less than 2 (means high risk) and the same time more than 3 provide better availability but more overhead and more size required.

While deciding replication factor consider below points as well:

A): Broker Size Replication factor directly impacts the overall broker disk size

So a high replication factor requires more disk size

B)Large Number of Partition replication: In case of a large number of partitions replication extra latency is added.

like image 77
Nitin Avatar answered Oct 04 '22 17:10

Nitin


A broker can only host a single replica for a partition.

So if your cluster has 3 brokers, the maximum replication factor you can have is 3.

While it's in theory possible to setup a topic with a very large replication factor, in practice there's rarely any benefits setting it above 4. Replicas are used to high availability and durability and basically determine how many brokers you can go offline before losing any data. If you have 3 replicas, it's unlikely all 3 brokers will crash/fail at the same time.

like image 24
Mickael Maison Avatar answered Oct 04 '22 16:10

Mickael Maison