Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka topic per producer

Lets say I have multiple devices. Each device has different type of sensors. Now I want to send the data from each device for each sensor to kafka. But I am confused about the kafka topics. For processing this real time data

Is it good to have kafka topic per device and all the sensors from that device will send the data to particular kafka topic, or I should create one topic and have all the devices send the data to that one topic.

If I go with first case where we will create topic per device then,

Device1 (sensor A, B, C) -> topic1

Device2 (sensor A, B, C) -> topic2

  1. how many topics I can create?
  2. Will this model scale?

Case 2: where, sending data to one topic

Device1 (sensor A, B, C), Device2 (sensor A, B, C)....DeviceN.... -> topic

  1. Isn't this going to be bottleneck for data. Since it will behave as queue data from some sensor will be way behind in queue and will not be processed in real time.

  2. Will this model scale?

EDIT

Lets say each device is associated with user (many to one). So I want to process data according to devices. So the way I want to process data is, each device and its sensor data will go to the user after some processing.

Say I do following

Device1

-> Sensor A - Topic1 Partition 1

-> Sensor B - Topic1 Partition 2

Device2

-> Sensor A - Topic2 Partition 1

-> Sensor B - Topic2 Partition 2

I want some pub/sub type of behavior. Since devices can be added or removed also sensors can be added or removed. Is there a way to create these topics and partition on the fly.

If not kafka, what pub/sub will be suitable for this kind of behavior.

like image 219
big Avatar asked Sep 27 '16 23:09

big


2 Answers

It depends on your semantics:

  • a topic is a logical abstraction and should contain "unify" data, ie, data with the same semantical meaning
  • a topic can easily be scaled out via its number of partitions

For example, if you have different type of sensors collecting different data, you should use a topic for each type.

Since devices can be added or removed also sensors can be added or removed. Is there a way to create these topics and partition on the fly.

If device meta data (to distinguish where date comes from) is embedded in each message, you should use a single topic with many partitions to scale out. Adding new topics or partitions is possible but must be done manually. For adding new partitions, a problem might be that it might change your data distribution and thus might break semantics. Thus, best practice is to over partition your topic from the beginning on to avoid adding new partitions.

If there is no embedded meta data, you would need multiple topics (eg, per user, or per device) to distinguish message origins.

As an alternative, maybe a single topic with multiple partitions and a fixed mapping from device/sensor to partition -- via using a custom partitioner -- would work, too. For this case, adding new partitions is no problem, as you control data distribution and can keep it stable.

Update

There is a blog post discussing this: https://www.confluent.io/blog/put-several-event-types-kafka-topic/

like image 111
Matthias J. Sax Avatar answered Oct 18 '22 12:10

Matthias J. Sax


I would create topics based on sensors and partitions based on devices:

A sensor on Device 1 -> topic A, partition 1.
A sensor on Device 2 -> topic A, partition 2.
B sensor on Device 2 -> topic B, partition 2.

and so on.

I don't know what kind of sensors you have, but they seems to belong semantically to the same set of data. With the help of partitions you can have parallel processing.

But it depends on how you want to process you data: is it more important to process sensors together or devices?

like image 20
Balázs Németh Avatar answered Oct 18 '22 12:10

Balázs Németh