Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Desigining Kafka Topics - Many Topics vs One Big Topic

Considering a stream of different events the recommended way would be

  • one big topic containing all events
  • multiple topics for different types of events

Which option would be better?

I understand that messages not being in the same partition of a topic it means there are no order guarantee, but are there any other factors to be considered when making this decision?

like image 646
user3452075 Avatar asked Mar 11 '23 15:03

user3452075


2 Answers

A topic is a logical abstraction and should contain message of the same type. Let's say, you monitor a website and capture click stream events and on the other hand you have a database that populates it's changes into a changelog topics. You should have two different topics because click stream events are not related to you database changelog.

This has multiple advantages:

  • your data will have different format und you will need different (de)serializers to write read the data (using a single topic you would need a hybrid serializer and you will not get type safety when reading data)
  • you will have different consumer application and one application might be interested in click stream events only, while a second application is only interested in the database changelog and a third application is interested in both. If you have multiple topics, application one and two only subscribe to the topics they are interesting in -- if you have a single topic, application one an two need to read everything and filter the stuff they are not interested in increasing broker, network, can client load
like image 193
Matthias J. Sax Avatar answered May 04 '23 22:05

Matthias J. Sax


As @Matthias J. Sax told before there is not a golden bullet over here. But we have to take different topics into account.

The conditioner: ordered deliveries

If you application needs guarantee order delivery, you need to work with only one topic, plus same keys for those messages which need to guarantee it.

If ordering is not mandatory, the game starts...

Does the schema same for all messages?

Would be consumers interested in the same type of different events?

What is gonna happen at the consumer side?, do we are reducing or increasing complexity in terms of implementation, maintainability, error handling...?

Does horizontal scalability important for us? More topics often means more partitions available, which means more horizontal scalability capacity. Also it allows more accurate scalability configuration at the broker side, because we can choose what number of partitions to increase per event type. or at the consumer side, what number of consumers stand up per event type.

Does makes sense parallelising consumption per message type? ...

Technically speaking, if we allow consumers to fine tune those type of events to be consumed we're potentially reducing the network bandwidth required to send undesired messages from the broker to the consumer, plus the number deserialisations for all of them (cpu used, which makes along time more free resources, energy cost reduction...).

Also is worthy to remember that splitting different type of messages in different topics doesn't mean have to consume them with different Kafka consumers because they allow consumption from different topics at the same time.

Well, there's not a clear answer for this question, but I have the feeling that with Kafka, because multiple features, if ordered deliveries are not needed we should split our messages per type in different topics.

like image 45
Dani Avatar answered May 04 '23 21:05

Dani