Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not clear about the meaning of auto.offset.reset and enable.auto.commit in Kafka

Tags:

I am new to Kafka,and I don't really understand the meaning of Kafka configuration, can anyone explain more understandable to me !

Here is my code:

 val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "master:9092,slave1:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "GROUP_2017",
  "auto.offset.reset" -> "latest", //earliest or latest
  "enable.auto.commit" -> (true: java.lang.Boolean)
)

what does it mean in my code?

like image 841
Gpwner Avatar asked Jul 05 '17 13:07

Gpwner


People also ask

What is auto offset reset in Kafka?

Second, use auto. offset. reset to define the behavior of the consumer when there is no committed position (which would be the case when the group is first initialized) or when an offset is out of range. You can choose either to reset the position to the “earliest” offset or the “latest” offset (the default).

What is enable auto commit in Kafka?

enable.auto.commit …​FIXME. By default, as the consumer reads messages from Kafka, it will periodically commit its current offset (defined as the offset of the next message to be read) for the partitions it is reading from back to Kafka. Often you would like more control over exactly when offsets are committed.

What is auto offset reset earliest in Kafka?

The auto offset reset consumer configuration defines how a consumer should behave when consuming from a topic partition when there is no initial offset. This is most typically of interest when a new consumer group has been defined and is listening to a topic for the first time.

What is commit offset in Kafka?

Apache Kafka Offset Commit activity notifies Kafka Consumer Trigger to commit given offset. This is useful in case you want offsets to be committed as soon as the record is processed in the flow. By default, offsets are committed only when flow is successfully executed.


1 Answers

I will explain to you the meaning, but I highly suggest to read Kafka Web Site Configuration

"bootstrap.servers" -> "master:9092,slave1:9092"

Essentially the Kafka cluster configuration: IP and Port.

 "key.deserializer" -> classOf[StringDeserializer]
 "value.deserializer" -> classOf[StringDeserializer]

This SO answer explain what is the purpose.

"group.id" -> "GROUP_2017"

A consumer process will belong to a groupId. A groupId can have multiple Consumers and Kafka will assign only one Consumer process to only one Partition (for data consuming). If the number of consumers is greater than the partitions available, then some processes will be idle.

"enable.auto.commit" -> (true: java.lang.Boolean)

Wether that flag is true, then Kafka is able to commit the message you brought from Kafka using Zookeeper to persist the last 'offset' which it read. This approach is not the best to use when you want a more robust solution for a production system, because does not ensure that the records you brought were correctly processed (using the logic you wrote in your code). If this flag is false, Kafka will not know which was the last offset read so when you restart the process, it will start reading the 'earliest' or the 'latest' offset depending on the value of your next flag (auto.offset.reset). Finally, This Cloudera article explains in details how to manage in a proper way the offsets.

"auto.offset.reset" -> "latest"

This flag tells Kafka where to start reading offsets in case you do not have any 'commit' yet. In others words, it will start either from the 'earliest' or from the 'latest' if you have not persisted any offset in Zookeeper yet (Manually or using enable.auto.commit flag).

like image 102
dbustosp Avatar answered Oct 22 '22 12:10

dbustosp