Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the difference between kafka ProducerRecord and KeyedMessage

I'm measuring the kafka producer producer performance. Currently I've met two clients with bit different configuration and usage:

Common:

def buildKafkaConfig(hosts: String, port: Int): Properties = {
  val props = new Properties()    
  props.put("metadata.broker.list", brokers)
  props.put("serializer.class", "kafka.serializer.StringEncoder")
  props.put("producer.type", "async") 
  props.put("request.required.acks", "0")
  props.put("queue.buffering.max.ms", "5000")
  props.put("queue.buffering.max.messages", "2000")
  props.put("batch.num.messages", "300")
  props
}

First Client:

"org.apache.kafka" % "kafka_2.11" % "0.8.2.2" 

Usage:

val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new Producer[String, String](new ProducerConfig(kafkaConfig))

// ... somewhere in code 
producer.send(new KeyedMessage[String, String]("my-topic", data))

Second Client:

"org.apache.kafka" % "kafka-clients" % "0.8.2.2"

Usage:

val kafkaConfig = KafkaUtils.buildKafkaConfig("kafkahost", 9092)
val producer = new KafkaProducer[String, String](kafkaConfig)
// ... somewhere in code 
producer.send(new ProducerRecord[String, String]("my-topic", data))

My questions are:

  • What is the difference between 2 clients?
  • Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?
like image 660
Julias Avatar asked Mar 12 '23 22:03

Julias


1 Answers

what is the difference between 2 clients?

They are simply old vs new APIs. Kafka starting 0.8.2.x exposed a new set of API's to work with kafka, older being Producer which works with KeyedMessage[K,V] where the new API is KafkaProducer with ProducerRecord[K,V]:

As of the 0.8.2 release we encourage all new development to use the new Java producer. This client is production tested and generally both faster and more fully featured than the previous Scala client.

You should preferably be using the new supported version.

Which properties should I configure, take into account to achieve optimal, high heavy writes performance, for high scale application?

This is a very broad question, which depends a lot on the architecture of your software. It varies with scale, amount of producers, amount of consumers, etc.. There are many things to be taken into account. I would suggest going through the documentation and reading up the sections talking about Kafka's architecture and design to get a better picture of how it works internally.

Generally speaking, from my experience you'll need to balance the replication factor of your data, along with retention times and number of partitions each queue goes into. If you have more specific questions down the road, you should definitely post a question.

like image 179
Yuval Itzchakov Avatar answered Mar 16 '23 04:03

Yuval Itzchakov