I'm trying to understand the 'compression.type' configuration and my question is, If i set 'compression.type' at topic level and producer level, which takes precedence?
Producer-Level Message Compression in Kafka If the producer is sending compressed messages, all the messages in a single producer batch are compressed together and sent as the "value" of a "wrapper message". Compression is more effective the bigger the batch of messages being sent to Kafka!
Kafka supports four primary types of compression: Gzip. Snappy. Lz4. Zstd.
If you set compression. type property in the configuration of the producer, then the messages will be compressed before sending them to the broker. If you set this property in the server configuration, then it specifies how the messages will be compressed in the broker.
Kafka supports 4 compression codecs: none , gzip , lz4 and snappy .
When the broker receives a compressed batch of messages from a producer:
producer
, or if the codecs of the batch and destination topic are the same, the broker takes the compressed batch from the client and writes it directly to the topic’s log file without recompressing the data.Decompression and re-compression can also happen if producers are running a version prior to 0.10 because offsets need to be overwritten, or if any other message format conversion is required.
I tried out some experiments to answer this:
**Note:** server.properties has the config compression.type=producer
./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=producer --topic t
./kafka-console-producer.sh --broker-list node:6667 --topic t
./kafka-console-producer.sh --broker-list node:6667 --topic t --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667 --topic t
sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t-0/00000000000000000000.log
Dumping /kafka-logs/t-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: NONE
offset: 1 position: 69 compresscodec: GZIP
offset: 2 position: 158 compresscodec: NONE
./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=gzip --topic t1
./kafka-console-producer.sh --broker-list node:6667 --topic t1
./kafka-console-producer.sh --broker-list node:6667 --topic t1 --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667 --topic t1 --compression-codec snappy
sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t1-0/00000000000000000000.log
Dumping /kafka-logs/t1-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: GZIP
offset: 1 position: 89 compresscodec: GZIP
offset: 2 position: 178 compresscodec: GZIP
Clearly the topic takes the override.
w.r.t compression and decompression text from Kafka - the definitive guide
The Kafka broker must decompress all message batches, however, in order to validate the checksum of the individual messages and assign offsets. It then needs to recompress the message batch in order to store it on disk.
As of version 0.10, there is a new message format that allows for relative offsets in a message batch. This means that newer producers will set relative offsets prior to sending the message batch, which allows the broker to skip recompression of the message batch.
So, when the compression type is different, the topic compression is honoured. if it is same, it will retain the original compression codec set by the producer.
Reference - https://kafka.apache.org/documentation/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With