Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do .index files exist in the kafka-log directory?

Tags:

apache-kafka

I just made a new topic, and haven't produced any message yet. A file named in 00000000000000000000.index was created in the directory /tmp/kafka-logs-1/topicname-0/, and the size of that file is really big. I opened that binary file in vi, and the contents are only "0000 0000 0000 0000..." What does this mean? What is this index file about?

like image 257
user2884721 Avatar asked Oct 16 '13 03:10

user2884721


People also ask

What is log file in Kafka?

Introduction to Logs in Apache Kafka Apache Kafka logs are a collection of various data segments present on your disk, having a name as that of a form-topic partition or any specific topic-partition. Each Kafka log provides a logical representation of a unique topic-based partitioning.

Where are Kafka logs stored?

The Kafka log files are created at the /opt/bitnami/kafka/logs/ directory. The main Kafka log file is created at /opt/bitnami/kafka/logs/server.

Does Kafka use memory mapped files?

We know that kafka use memory mapped files for it's index files ,however it's log files don't use the memory mapped files technology.

Where are messages stored in Kafka?

The default log. dir is /tmp/kafka-logs which you may want to change in case your OS has a /tmp directory cleaner.


1 Answers

Every segment of a log (the files *.log) has it's corresponding index (the files *.index) with the same name as they represent the base offset.

For understanding, the log file contains the actual messages structured in a message format. For each message within this file, the first 64bits describe the incremented offset. Now, looking up this file for messages with a specific offset becomes expensive since log files may grow in the range of gigabytes. And to be able to produce messages, the broker actually has to do such kind of lookups to determine the latest offset and be able to further increment incoming messages correctly.

This is why there is an index file. First of all, the structure of the messages within the index file describes only 2 fields, each of them 32bit long:

  1. 4 Bytes: Relative Offset
  2. 4 Bytes: Physical Position

As described before, the file name represents the base offset. In contrast to the log file where the offset is incremented for each message, the messages within the index files contain a relative offsets to the base offset. The second field represents the physical position of the related log message (base offset + relative offset) and thus, a lookup of O(1) becomes possible.

After all there is to mention, that not every message within a log has it's corresponding message within the index. The configuration parameter index.interval.bytes, which is 4096 bytes by default, sets an index interval which basically describes how frequently (after how many bytes) an index entry will be added.

Regarding the question to size of the .index file there is the following to say: The configuration parameter segment.index.bytes, which is 10MB by default, describes the size of this file. This space is reallocated and will shrink only after log rolls.

like image 74
Marc Juchli Avatar answered Oct 25 '22 17:10

Marc Juchli