I'm trying to understand the 'compression.type' configuration and my question is, If i set 'compression.type' at topic level and producer level, which takes precedence?

When the broker receives a compressed batch of messages from a producer: <ul> <li>it always decompresses the data in order to validate it</li> <li>it considers the compression codec of the destination topic <ul> <li>if the compression codec of the destination topic is <code>producer</code>, or if the codecs of the batch and destination topic are the same, the broker takes the compressed batch from the client and writes it directly to the topic’s log file without recompressing the data.</li> <li>Otherwise, the broker needs to re-compress the data to match the codec of the destination topic.</li> </ul> </li> </ul> Decompression and re-compression can also happen if producers are running a version prior to 0.10 because offsets need to be overwritten, or if any other message format conversion is required.

If i set 'compression.type' at topic level and producer level, which takes precedence

2 Answers

When the broker receives a compressed batch of messages from a producer:

it always decompresses the data in order to validate it
it considers the compression codec of the destination topic
- if the compression codec of the destination topic is producer, or if the codecs of the batch and destination topic are the same, the broker takes the compressed batch from the client and writes it directly to the topic’s log file without recompressing the data.
- Otherwise, the broker needs to re-compress the data to match the codec of the destination topic.

Decompression and re-compression can also happen if producers are running a version prior to 0.10 because offsets need to be overwritten, or if any other message format conversion is required.

110

answered Sep 28 '22 16:09

Ashhar Hasan

I tried out some experiments to answer this:

**Note:** server.properties has the config compression.type=producer

./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=producer --topic t

./kafka-console-producer.sh --broker-list node:6667  --topic t
./kafka-console-producer.sh --broker-list node:6667  --topic t --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667  --topic t

sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t-0/00000000000000000000.log

Dumping /kafka-logs/t-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: NONE 
offset: 1 position: 69 compresscodec: GZIP 
offset: 2 position: 158 compresscodec: NONE

./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=gzip --topic t1

./kafka-console-producer.sh --broker-list node:6667  --topic t1
./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec snappy

 sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t1-0/00000000000000000000.log
Dumping /kafka-logs/t1-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: GZIP 
offset: 1 position: 89 compresscodec: GZIP 
offset: 2 position: 178 compresscodec: GZIP

Clearly the topic takes the override.

w.r.t compression and decompression text from Kafka - the definitive guide

The Kafka broker must decompress all message batches, however, in order to validate the checksum of the individual messages and assign offsets. It then needs to recompress the message batch in order to store it on disk.

As of version 0.10, there is a new message format that allows for relative offsets in a message batch. This means that newer producers will set relative offsets prior to sending the message batch, which allows the broker to skip recompression of the message batch.

So, when the compression type is different, the topic compression is honoured. if it is same, it will retain the original compression codec set by the producer.

Reference - https://kafka.apache.org/documentation/

answered Sep 28 '22 18:09

nandini

Related questions
                            
                                Kafka unable to connect to Zookeeper
                            
                                KafkaProducer not successfully sending message into the queue
                            
                                Simple Kafka Consumer not receiving messages
                            
                                Why is Kafka consumer ignoring my "earliest" directive in the auto.offset.reset parameter and thus not reading my topic from the absolute first event?
                            
                                Uneven Distribution of messages in Kafka Partitions
                            
                                How can I retry failure messages from kafka?
                            
                                Java consumer group missing?
                            
                                What makes Kafka high in throughput?
                            
                                How Kafka Nodes and zookeeper will communicate with each other?
                            
                                Rails: How to listen to / pull from service or queue?
                            
                                Using Kafka Connect HOWTO "commit offsets" as soon as a "put" is completed in SinkTask
                            
                                How does Kafka guarantee sequential disk access?
                            
                                Kafka consumer group offset retention
                            
                                Kafka Error connecting to node ubuntukafka:9092 (id: 0 rack: null) (org.apache.kafka.clients.NetworkClient) java.net.UnknownHostException:
                            
                                Kafka Producer cannot validate record wihout PK and return InvalidRecordException
                            
                                Fail to create SparkContext
                            
                                Azure Event Hub vs Kafka as a Service Broker
                            
                                SBT cannot import Kafka encoder/decoder classes
                            
                                kafka consumer to dynamically detect topics added
                            
                                Kafka Console consumer with kerberos authentication

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

If i set 'compression.type' at topic level and producer level, which takes precedence

Tags:

apache-kafka

Raj

People also ask

2 Answers

Ashhar Hasan

nandini

Recent Activity

Donate For Us