Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If i set 'compression.type' at topic level and producer level, which takes precedence

Tags:

apache-kafka

I'm trying to understand the 'compression.type' configuration and my question is, If i set 'compression.type' at topic level and producer level, which takes precedence?

like image 621
Raj Avatar asked Jan 24 '20 19:01

Raj


People also ask

How does compression work in Kafka?

Producer-Level Message Compression in Kafka If the producer is sending compressed messages, all the messages in a single producer batch are compressed together and sent as the "value" of a "wrapper message". Compression is more effective the bigger the batch of messages being sent to Kafka!

What is compression type producer Kafka?

Kafka supports four primary types of compression: Gzip. Snappy. Lz4. Zstd.

Which property is used to specify compression type?

If you set compression. type property in the configuration of the producer, then the messages will be compressed before sending them to the broker. If you set this property in the server configuration, then it specifies how the messages will be compressed in the broker.

Is the compression codecs supported in Kafka?

Kafka supports 4 compression codecs: none , gzip , lz4 and snappy .


2 Answers

When the broker receives a compressed batch of messages from a producer:

  • it always decompresses the data in order to validate it
  • it considers the compression codec of the destination topic
    • if the compression codec of the destination topic is producer, or if the codecs of the batch and destination topic are the same, the broker takes the compressed batch from the client and writes it directly to the topic’s log file without recompressing the data.
    • Otherwise, the broker needs to re-compress the data to match the codec of the destination topic.

Decompression and re-compression can also happen if producers are running a version prior to 0.10 because offsets need to be overwritten, or if any other message format conversion is required.

like image 110
Ashhar Hasan Avatar answered Sep 28 '22 16:09

Ashhar Hasan


I tried out some experiments to answer this:

**Note:** server.properties has the config compression.type=producer 

./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=producer --topic t

./kafka-console-producer.sh --broker-list node:6667  --topic t
./kafka-console-producer.sh --broker-list node:6667  --topic t --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667  --topic t

sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t-0/00000000000000000000.log

Dumping /kafka-logs/t-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: NONE 
offset: 1 position: 69 compresscodec: GZIP 
offset: 2 position: 158 compresscodec: NONE 

./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=gzip --topic t1

./kafka-console-producer.sh --broker-list node:6667  --topic t1
./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec snappy

 sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t1-0/00000000000000000000.log
Dumping /kafka-logs/t1-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: GZIP 
offset: 1 position: 89 compresscodec: GZIP 
offset: 2 position: 178 compresscodec: GZIP 

Clearly the topic takes the override.

w.r.t compression and decompression text from Kafka - the definitive guide

The Kafka broker must decompress all message batches, however, in order to validate the checksum of the individual messages and assign offsets. It then needs to recompress the message batch in order to store it on disk.

As of version 0.10, there is a new message format that allows for relative offsets in a message batch. This means that newer producers will set relative offsets prior to sending the message batch, which allows the broker to skip recompression of the message batch.

So, when the compression type is different, the topic compression is honoured. if it is same, it will retain the original compression codec set by the producer.

Reference - https://kafka.apache.org/documentation/

like image 42
nandini Avatar answered Sep 28 '22 18:09

nandini