Kafka architecture many partitions or many topics?

Tags:

I am looking to setup Kafka as an intermediary between data coming from IoT machines and a service that will process that data. I am having some issues identifying the proper way to design my topics based on my usecase and would love some advice.

I am looking to read sensor data from many machines, and each machine could have many sensors. eg( temperature, pressure, parts etc..) The order of these messages that my consumers will read is imporant and needs to be sequential.

I have come up with three possible designs but I am not sure which is best, if any?

a) Each machine will write to a specific topic with 1 partition to guarantee sequence. so machine 100 will write to topics called : machine100TempSensor1, machine100TempSensor2, machine100PressureSensor1 etc..

b) all machines will write to a single topic but the partitions will be based on machine/sensor so using the same example as above, machine 100 will write to a topic called 'temperature' but will be keyd on the machine and sensor.

eg.
(Topic: temperature, partition : machine100TempSensor1)
(Topic: temperature, partition : machine100TempSensor2)
(Topic: temperature, partition : machine200TempSensor1)

c) produce all temperature related messages to a temperature topic and filter the messages as I process the data.

My concerns with all solutions,

a) - Kafka guarantees sequence on the partition level only, so would creating a topic with a single partition be a good idea or does that go against what a topic should be?
- If I wanted to read 'Temperature' from all machines, I would have to know the names and request data from specific topics instead of a general 'Temperature' topic.
- Kafka states that only one consumer group can read from a single partition, so I would have to create many consumer groups.

b) - A single 'temperature' topic could possibly have 30+ partitions if not 100s/1000s if I consider scaling. (but I would have the benefit of reading all partitions at once)
- Since only a single consumer group is able to read from a single partition, I will have a consumer group for every consumer.

c) - I feel there could be a big performance cost in filtering thousands of useless messages.
- I will run into the same issue when it comes time to pushing the processed data to kafka.

Something to consider is that I would like to have the ability to process certain machines/sensors.

Hopefully I have been able to explain everything clearly.

541

asked Feb 12 '18 01:02

Dimitrije M

1 Answers

Your overall understanding of Kafka is not 100% correct.

1) Kafka basically scales over partitions -- thus, for the brokers, there is no difference (from a performance perspective) if you use 1 topic with 1000 partitions of 1000 topics with 1 partition each. (If you plan to use Kafka Streams (aka Streams API), using a singe topic with 1000 partitions would be better, because Kafka Streams does not scale very good across topics.)

2) Creating single partition topics to guarantee ordering if basically absolutely fine. For subscribing to multiple topics at once, you could use pattern subscription if you name the topics accordingly.

3) A single broker can host multiple thousand partitions. Thus, even with replication taken into account, you don't need a huge cluster.

4) This claim sounds incorrect (or maybe I miss understand it):

Kafka states that only one consumer group can read from a single partition, so I would have to create many consumer groups.

Maybe you mean, only one consumer within a single consumer group. That it would be correct. If you have a consumer group, you can assign (either manual or using built-in consumer group management) each partition to at most one consumer within the group. You only need multiple consumer groups if multiple applications want to read the same partition.

5) Your concern about (c) seems legit.

answered Sep 30 '22 20:09

Matthias J. Sax

Related questions
                            
                                How do you write a magic file test pattern to match the end of a file?
                            
                                Including partial views when applying the Mode-View-ViewModel design pattern
                            
                                Is Something-Aware a design pattern?
                            
                                What is the difference between inheritance and composition?
                            
                                Initializing instance variables in iPhone Development / Objective-C
                            
                                Is using a singleton as a data manager class bad?
                            
                                Retaining trait individualities while mixing them in
                            
                                How can I have a behavior-rich domain entity that adheres to Open-Closed Principle?
                            
                                If SqlDataAdapter uses a data reader internally, why do people say that using a SqlDataReader is faster?
                            
                                Activity based permissions with Backbone, API design ideas?
                            
                                Alternatives to passing a pointer to yourself to objects you own
                            
                                Method naming in visitor design pattern
                            
                                Construct a model of an electric circuit in java
                            
                                Spring prototype following prototype design pattern
                            
                                How do you use Table Data Gateway pattern involving one-to-many relationships?
                            
                                Code duplication in enums inheriting a common interface
                            
                                Observable pattern implementation in Java
                            
                                What is the best way to navigate a complex tree of dissimilar objects?
                            
                                Which it is the place for NSFetchedResultsController in VIPER architecture?
                            
                                Scala : Registry design pattern or similar?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kafka architecture many partitions or many topics?

Tags:

design-patterns

apache-kafka

Dimitrije M

People also ask

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us