Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache- kafka with 100 millions of topics

I'm trying to replace rabbit mq with apache-kafka and while planning, I bumped in to several conceptual planning problem.

First we are using rabbit mq for per user queue policy meaning each user uses one queue. This suits our need because each user represent some job to be done with that particular user, and if that user causes a problem, the queue will never have a problem with other users because queues are seperated ( Problem meaning messages in the queue will be dispatch to the users using http request. If user refuses to receive a message (server down perhaps?) it will go back in retry queue, which will result in no loses of message (Unless queue goes down))

Now kafka is fault tolerant and failure safe because it write to a disk. And its exactly why I am trying to implement kafka to our structure.

but there are problem to my plannings.

First, I was thinking to create as many topic as per user meaning each user would have each topic (What problem will this cause? My max estimate is that I will have around 1~5 million topics)

Second, If I decide to go for topics based on operation and partition by random hash of users id, if there was a problem with one user not consuming message currently, will the all user in the partition have to wait ? What would be the best way to structure this situation?

So as conclusion, 1~5 millions users. We do not want to have one user blocking large number of other users being processed. Having topic per user will solve this issue, it seems like there might be an issue with zookeeper if such large number gets in (Is this true? )

what would be the best solution for structuring? Considering scalability?

like image 466
Hyounmin Wang Avatar asked Jul 05 '16 06:07

Hyounmin Wang


People also ask

Can Kafka have millions of topics?

No, there is no limit on the topic quantity. However, there is an upper limit on the aggregate number of partitions of topics. After the partition limit is reached, you can no longer create topics.

How many Kafka topics is too many?

The rule of thumb is that the number of Kafka topics can be in the thousands. Jun Rao (Kafka committer; now at Confluent but he was formerly in LinkedIn's Kafka team) wrote: At LinkedIn, our largest cluster has more than 2K topics. 5K topics should be fine.

How many message Kafka can handle?

How many messages can Apache Kafka® process per second? At Honeycomb, it's easily over one million messages.

What is the maximum size of data for Kafka?

Kafka has a default limit of 1MB per message in the topic.


1 Answers

First, I was thinking to create as many topic as per user meaning each user would have each topic (What problem will this cause? My max estimate is that I will have around 1~5 million topics)

I would advise against modeling like this.

Google around for "kafka topic limits", and you will find the relevant considerations for this subject. I think you will find you won't want to make millions of topics.

Second, If I decide to go for topics based on operation and partition by random hash of users id

Yes, have a single topic for these messages and then route those messages based on the relevant field, like user_id or conversation_id. This field can be present as a field on the message and serves as the ProducerRecord key that is used to determine which partition in the topic this message is destined for. I would not include the operation in the topic name, but in the message itself.

if there was a problem with one user not consuming message currently, will the all user in the partition have to wait ? What would be the best way to structure this situation?

This depends on how the users are consuming messages. You could set up a timeout, after which the message is routed to some "failed" topic. Or send messages to users in a UDP-style, without acks. There are many ways to model this, and it's tough to offer advice without knowing how your consumers are forwarding messages to your clients.


Also, if you are using Kafka Streams, make note of the StreamPartitioner interface. This interface appears in KStream and KTable methods that materialize messages to a topic and may be useful in a chat applications where you have clients idling on a specific TCP connection.

like image 179
Dmitry Minkovsky Avatar answered Sep 28 '22 02:09

Dmitry Minkovsky