Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing in kafka-python

I have been using the python-kaka module to consume from a kafka broker. I want to consume from the same topic with 'x' number of partitions in parallel. The documentation has this :

# Use multiple consumers in parallel w/ 0.9 kafka brokers
# typically you would run each on a different server / process / CPU
 consumer1 = KafkaConsumer('my-topic',
                      group_id='my-group',
                      bootstrap_servers='my.server.com')
  consumer2 = KafkaConsumer('my-topic',
                      group_id='my-group',
                      bootstrap_servers='my.server.com')

Does this mean I can create a separate consumer for each process that I spawn? Also, will there be an overlap on the messages being consumed by consumer1 and consumer2 ?

Thanks

like image 644
red_devil Avatar asked May 24 '16 14:05

red_devil


1 Answers

Yes, you can create multiple consumers in multiple threads/processes (and even run them in parallel on different machines). As long as all consumers use the same group.id, there will be no overlap. Kafka assigns each topic partition to a single consumer within a consumer group. Be aware, that using more consumers than available topic partitions will result in idle consumers.

like image 137
Matthias J. Sax Avatar answered Oct 23 '22 04:10

Matthias J. Sax