After setting up the Kafka Broker cluster and creating few topics, we found that the following two topics are automatically created by Kafka:
__consumer_offsets
_schema
What is the importance and use of these topics ?
__consumer_offsets is used to store information about committed offsets for each topic:partition per group of consumers (groupID). It is compacted topic, so data will be periodically compressed and only latest offsets information available.
__consumer_offsets is a kafka internal topic and it is not allowed to be deleted through delete topic command. It contains information about committed offsets for each topic:partition for each group of consumers (groupID). If you want to wipe it out entirely you have to delete the zookeeper dataDir location.
current-offset is the last committed offset of the consumer instance, log-end-offset is the highest offset of the partition (hence, summing this column gives you the total number of messages for the topic)
Kafka's topics are divided into several partitions. While the topic is a logical concept in Kafka, a partition is the smallest storage unit that holds a subset of records owned by a topic . Each partition is a single log file where records are written to it in an append-only fashion.
__consumer_offsets is used to store information about committed offsets for each topic:partition per group of consumers (groupID). It is compacted topic, so data will be periodically compressed and only latest offsets information available.
_schema - is not a default kafka topic (at least at kafka 8,9). It is added by Confluent. See more: Confluent Schema Registry - github.com/confluentinc/schema-registry (thanks @serejja)
__consumer_offsets
: Every consumer group maintains its offset per topic partitions. Since v0.9
the information of committed offsets for every consumer group is stored in this internal topic (prior to v0.9
this information was stored on Zookeeper). When the offset manager receives an OffsetCommitRequest
, it appends the request to a special compacted Kafka topic named __consumer_offsets
. Finally, the offset manager will send a successful offset commit response to the consumer, only when all the replicas of the offsets topic receive the offsets.
_schemas
: This is an internal topic used by the Schema Registry which is a distributed storage layer for Avro schemas. All the information which is relevant to schema, subject (with its corresponding version), metadata and compatibility configuration is appended to this topic. The schema registry in turn, produces (e.g. when a new schema is registered under a subject) and consumes data from this topic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With