Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading latest in spark kafka streaming

I wish to read only latest msg in spark streaming using kafka, but it also fetches past data

How to set auto.offset.reset in KafkaUtil for spark

JavaPairReceiverInputDStream<String, String> messages =
            KafkaUtils.createStream(jssc, args[0], args[1], topicMap);

how to set the conf to fetching only current message . Please give some example.

Thanks in advance, there is also another thread

But not sufficient, pls help me out. Thanks in advance.

like image 581
mithra Avatar asked Mar 18 '23 08:03

mithra


1 Answers

You need to use this method from KafkaUtils object:

 def createStream[K, V, U <: Decoder[_], T <: Decoder[_]](
      jssc: JavaStreamingContext,
      keyTypeClass: Class[K],
      valueTypeClass: Class[V],
      keyDecoderClass: Class[U],
      valueDecoderClass: Class[T],
      kafkaParams: JMap[String, String],
      topics: JMap[String, JInt],
      storageLevel: StorageLevel
    )

Depending on the Spark version, you cannot use java. There is a bug.

If you are using Spark 1.1.0, you need to add into kafkaParams parameter this property:

"auto.offset.reset", "largest"

Another workaround is generate a groupId prefix randomly, but this is crappy.

like image 125
ajnavarro Avatar answered Apr 20 '23 02:04

ajnavarro