I am using Kafka 0.8.2.2 and am trying to set up compression. I am providing the compression-codec (gzip) as an argument to the console producer like below.
./kafka-console-producer.sh --broker-list localhost:171 --compression-codec gzip --topic testTopic
Questions Is this the only place where I need to specify compression? How do I verify if compression is indeed taking place? How do I quantify the benefit I am getting from compression? What files (.index, .log) I should look for and compare the sizes with and without compression to estimate the benefit?
Producer-Level Message Compression in Kafka If the producer is sending compressed messages, all the messages in a single producer batch are compressed together and sent as the "value" of a "wrapper message". Compression is more effective the bigger the batch of messages being sent to Kafka!
Making Kafka compression more effective Batching is especially better with entropy-less encoding like LZ4 and Snappy because these algorithms work the best with repeatable patterns in data. Two main producer properties are responsible for batching: Linger.ms (default is 0) Batch.
If you set compression. type property in the configuration of the producer, then the messages will be compressed before sending them to the broker. If you set this property in the server configuration, then it specifies how the messages will be compressed in the broker.
The Kafka cluster does not retain all the published messages. -- corredThis parameter allows you to set whether compression should be turned on for particular topics.
How to verify if compression is happening?
Use DumpLogSegments
tool and substitute your dir location / log file name (default log.dir
is /tmp/kafka-logs
)
bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /your_kafka_logs_dir/your_topic-your_partition/00000000000000000000.log --print-data-log | grep compresscodec
You will see something like below:
baseOffset: 0 lastOffset: 0 count: 1 ... compresscodec: NONE ...
baseOffset: 1 lastOffset: 1 count: 1 ... compresscodec: GZIP ...
baseOffset: 2 lastOffset: 2 count: 1 ... compresscodec: SNAPPY ...
baseOffset: 3 lastOffset: 3 count: 1 ... compresscodec: LZ4 ...
More info can be found in documentation here https://kafka.apache.org/documentation/#design_compression
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With