Ideal value for Kafka Connect Distributed tasks.max configuration setting?

Tags:

I am looking to productionize and deploy my Kafka Connect application. However, there are two questions I have about the tasks.max setting which is required and of high importance but details are vague for what to actually set this value to.

If I have a topic with n partitions that I wish to consume data from and write to some sink (in my case, I am writing to S3), what should I set tasks.max to? Should I set it to n? Should I set it to 2n? Intuitively it seems that I'd want to set the value to n and that's what I've been doing.

What if I change my Kafka topic and increase partitions on the topic? I will have to pause my Kafka Connector and increase the tasks.max if I set it to n? If I have set a value of 2n, then my connector should automatically increase the parallelism it operates?

320

asked Jan 27 '17 18:01

PhillipAMann

1 Answers

In a Kafka Connect sink, the tasks are essentially consumer threads and receive partitions to read from. If you have 10 partitions and have tasks.max set to 5, each task with receive 2 partitions to read from and track the offsets. If you have configured tasks.max to a number above the partition count Connect will launch a number of tasks equal to the partitions of the topics it's reading.

If you change the partition count of the topic you'll have to relaunch your connect task, if tasks.max is still greater than the partition count, Connect will start that many tasks.

edit, just discovered ConnectorContext: https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/connector/ConnectorContext.html

The connector will have to be written to include this but it looks like Connect has the ability to reconfigure a connector if there's a topic change (partitions added/removed).

103

answered Sep 26 '22 20:09

Chris Matta

Related questions
                            
                                How do I copy files from S3 to Amazon EMR HDFS?
                            
                                Fastest way to sync two Amazon S3 buckets
                            
                                How secure are Amazon AWS Access keys?
                            
                                Creating a folder via s3cmd (Amazon S3)
                            
                                Restricting S3 bucket access to a VPC
                            
                                Getting error "fork/exec /var/task/main: no such file or directory" while executing aws-lambda function
                            
                                Can't delete directory from Amazon S3
                            
                                How to scp to Amazon s3?
                            
                                Terraform: How to migrate state between projects?
                            
                                How do I use shell script to check if a bucket exists?
                            
                                Sharp image library rotates image when resizing?
                            
                                Amazon S3 copyObject permission
                            
                                Reading multiple files from S3 in Spark by date period
                            
                                Why Amazon S3 bucket name must be the same as website name when hosting a static website
                            
                                Does Amazon S3's HTTP Uploads feature support web-hook style callbacks?
                            
                                Can I run my static website from an S3 Bucket, and add password protection?
                            
                                Need help deciding between EBS vs S3 on Amazon Web Services
                            
                                How can I create a one time download link with Amazon S3?
                            
                                AWS S3 local server for integration testing
                            
                                How do you full text search an amazon s3 bucket?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ideal value for Kafka Connect Distributed tasks.max configuration setting?

Tags:

amazon-s3

apache-kafka

confluent-platform

apache-kafka-connect

PhillipAMann

People also ask

1 Answers

Chris Matta

Recent Activity

Donate For Us