Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideal configuration setting for max.tasks for Kafka Source Connector

I'm trying to run a HDFS Source Connector and a FileStream Source Connector. I was wondering how it would work if we set tasks.max > 1. Isn't it the connector's job to make sure that the parallelism is handled correctly?

For example, would it not be a problem for FileStream Source Connector if there are more than 1 tasks accessing the file? How will the connector know which line is being read by which task and how to make sure that there is no clash among tasks?

OR

Is it that the setting should be tasks.max=1 for such connectors where such a problem can occur?

like image 845
guru Avatar asked Nov 15 '25 05:11

guru


1 Answers

There is no such problem, since according the docs:

tasks.max - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism.

For example, for File Stream Source Connector max.tasks is simple ignored, while for JDBC Source Connector the real number of tasks is defined as minimum of tasks.max and tables count.

like image 168
Iskuskov Alexander Avatar answered Nov 17 '25 20:11

Iskuskov Alexander



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!