I'm measuring the kafka producer producer performance. Currently I've met two clients with bit different configuration and usage:
Common:
def buildKafkaConfig(hosts: String, port: Int): Properties = {
val props = new Properties()
props.put("metadata.broker.list", brokers)
props.put("serializer.class", "kafka.serializer.StringEncoder")
props.put("producer.type", "async")
props.put("request.required.acks", "0")
props.put("queue.buffering.max.ms", "5000")
props.put("queue.buffering.max.messages", "2000")
props.put("batch.num.messages", "300")
props
}
First Client:
"org.apache.kafka" % "kafka_2.11" % "0.8.2.2"
Usage:
val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new Producer[String, String](new ProducerConfig(kafkaConfig))
// ... somewhere in code
producer.send(new KeyedMessage[String, String]("my-topic", data))
Second Client:
"org.apache.kafka" % "kafka-clients" % "0.8.2.2"
Usage:
val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new KafkaProducer[String, String](kafkaConfig)
// ... somewhere in code
producer.send(new ProducerRecord[String, String]("my-topic", data))
My questions are:
what is the difference between 2 clients?
They are simply old vs new APIs. Kafka starting 0.8.2.x exposed a new set of API's to work with kafka, older being Producer
which works with KeyedMessage[K,V]
where the new API is KafkaProducer
with ProducerRecord[K,V]
:
As of the 0.8.2 release we encourage all new development to use the new Java producer. This client is production tested and generally both faster and more fully featured than the previous Scala client.
You should preferably be using the new supported version.
Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?
This is a very broad question, which depends a lot on the architecture of your software. It varies with scale, amount of producers, amount of consumers, etc.. There are many things to be taken into account. I would suggest going through the documentation and reading up the sections talking about Kafka's architecture and design to get a better picture of how it works internally.
Generally speaking, from my experience you'll need to balance the replication factor of your data, along with retention times and number of partitions each queue goes into. If you have more specific questions down the road, you should definitely post a question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With