What's the difference between the following ways of enabling compression in kafka:
Approach 1: Create a topic using the command:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --config compression.type=gzip --topic test
Approach 2: Set the property compression.type = gzip in Kafka Producer Client API.
I get better compression and higher throughput when using Approach 1.
If I use Approach 1, does it mean that the compression occurs at the broker end while in Approach 2, the messages are compressed at Producer end and then sent to broker?
Kafka supports two types of compression: producer-side and broker-side. Compression enabled producer-side doesn't require any configuration change in the brokers or in the consumers. Producers may choose to compress messages with the compression. type setting. Options are none , gzip , lz4 , snappy , and zstd.
Message Compression is always done at the producer side, so there is no requirement to change the configurations at the consumer or broker side. In the figure, a producer batch of 200 MB is created. After compression, it is reduced to 101 MB. To compress the data, a 'compression.
type property in the configuration of the producer, then the messages will be compressed before sending them to the broker. If you set this property in the server configuration, then it specifies how the messages will be compressed in the broker.
SyncProducer trying to connect to the kafka broker. Once it elapses, the producer throws an ERROR and stops. This parameter allows you to specify the compression codec for all data generated by this producer. This parameter allows you to set whether compression should be turned on for particular topics.
Kafka supports compression via property compression.type. The default value is none, which means messages are sent un-compressed. Otherwise, you specify the supported types: gzip, snappy, lz4, or zstd. Broker and Topic level compression settings
Hence, to make the system more flexible and resilient, it becomes important to implement Kafka Producer Configurations. Kafka Producer is the source of the data stream and it writes tokens or messages to one or more topics in a Kafka Cluster. ProducerConfig is the configuration of a Kafka Producer.
An optional configuration property, “ message.max.bytes “, can be used to allow all topics on a Broker to accept messages of greater than 1MB in size. And this holds the value of the largest record batch size allowed by Kafka after compression (if compression is enabled). Additional details are available in Kafka Documentation.
This configuration accepts the standard compression codecs ('gzip', 'snappy', 'lz4', 'zstd'). It additionally accepts 'uncompressed' which is equivalent to no compression; and 'producer' which means retain the original compression codec set by the producer.
If I use Approach 1, does it mean that the compression occurs at the broker end?
It depends. If the producer does not set a compression.type
or sets a different one, then the message will be compressed at the broker end. However, if producer also sets compression.type
to gzip
, no need to compress again at the broker end. Actually, there are other strict conditions that must be met to ensure no need to compress, although it's a little bit beyond of the scope.
in Approach 2, the messages are compressed at Producer end and then sent to broker?
Yes, records will be compressed before being sent to the broker if producer sets its compression.type config.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With