Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the practical limits of Kafka regex-topics / listening to multiple topics

I am exploring different PubSub platforms and I was wondering what the limits are in Kafka for listening to multiple topics. Consider for instance this Use Case. We have trains, station entry gates, devices that all publish their telemetry. Currently this is done on a MQ but as data rates increase, smart trains etc. we need to move to a new PubSub/streaming platform and Kafka is on that list of course.

As I see it there are two strategies for aggregating this telemetry into a stream:

  1. aggregate on consumption, in which each train/device initially gets its own topic and topic aggregation is done using a regex-topic / virtual topic
  2. aggregate on production, in which all trains produces to an single topic and consumers use filters if neccessary to single out individual producers

As I understood Kafka is not particularly suited for high number of topics (>10.000), but it could be done. Would a regex-topic be able to aggregate 2000, 3000 topics?

like image 690
Patrick Savalle Avatar asked Nov 06 '22 12:11

Patrick Savalle


1 Answers

From the technical point view, it could be done; but in practice, this is not common. Why? Zookeeper. it is advised for cluster to have a maximum of 4000 partitions per brokers. This is partly due to the overhead of performing leader election for all of those on Zookeeper.

I recommend you to read these blog posts about this interesting topic on Confluent's blog:

  • How to choose the number of topics/partitions in a Kafka cluster?
  • Apache Kafka Supports 200K Partitions Per Cluster
  • Apache Kafka Made Simple: A First Glimpse of a Kafka Without ZooKeeper
like image 137
marcosluis2186 Avatar answered Nov 15 '22 09:11

marcosluis2186