what is the difference between kafka ProducerRecord and KeyedMessage

Question

I'm measuring the kafka producer producer performance. Currently I've met two clients with bit different configuration and usage:

Common:

def buildKafkaConfig(hosts: String, port: Int): Properties = {
  val props = new Properties()    
  props.put("metadata.broker.list", brokers)
  props.put("serializer.class", "kafka.serializer.StringEncoder")
  props.put("producer.type", "async") 
  props.put("request.required.acks", "0")
  props.put("queue.buffering.max.ms", "5000")
  props.put("queue.buffering.max.messages", "2000")
  props.put("batch.num.messages", "300")
  props
}

First Client:

"org.apache.kafka" % "kafka_2.11" % "0.8.2.2"

Usage:

val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new Producer[String, String](new ProducerConfig(kafkaConfig))

// ... somewhere in code 
producer.send(new KeyedMessage[String, String]("my-topic", data))

Second Client:

"org.apache.kafka" % "kafka-clients" % "0.8.2.2"

Usage:

val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new KafkaProducer[String, String](kafkaConfig)
// ... somewhere in code 
producer.send(new ProducerRecord[String, String]("my-topic", data))

My questions are:

What is the difference between 2 clients?
Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?

Yuval Itzchakov · Accepted Answer

what is the difference between 2 clients?

They are simply old vs new APIs. Kafka starting 0.8.2.x exposed a new set of API's to work with kafka, older being Producer which works with KeyedMessage[K,V] where the new API is KafkaProducer with ProducerRecord[K,V]:

As of the 0.8.2 release we encourage all new development to use the new Java producer. This client is production tested and generally both faster and more fully featured than the previous Scala client.

You should preferably be using the new supported version.

Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?

This is a very broad question, which depends a lot on the architecture of your software. It varies with scale, amount of producers, amount of consumers, etc.. There are many things to be taken into account. I would suggest going through the documentation and reading up the sections talking about Kafka's architecture and design to get a better picture of how it works internally.

Generally speaking, from my experience you'll need to balance the replication factor of your data, along with retention times and number of partitions each queue goes into. If you have more specific questions down the road, you should definitely post a question.

what is the difference between kafka ProducerRecord and KeyedMessage

Tags:

scala

apache-kafka

kafka-producer-api

Julias

1 Answers

Yuval Itzchakov

Recent Activity

Donate For Us

what is the difference between kafka ProducerRecord and KeyedMessage

Tags:

scala

apache-kafka

kafka-producer-api

Julias

1 Answers

Yuval Itzchakov

Related questions

Recent Activity

Donate For Us