Read json from Kafka and write json to other Kafka topic

Tags:

I'm trying prepare application for Spark streaming (Spark 2.1, Kafka 0.10)

I need to read data from Kafka topic "input", find correct data and write result to topic "output"

I can read data from Kafka base on KafkaUtils.createDirectStream method.

I converted the RDD to json and prepare filters:

val messages = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

val elementDstream = messages.map(v => v.value).foreachRDD { rdd =>

  val PeopleDf=spark.read.schema(schema1).json(rdd)
  import spark.implicits._
  PeopleDf.show()
  val PeopleDfFilter = PeopleDf.filter(($"value1".rlike("1"))||($"value2" === 2))
  PeopleDfFilter.show()
}

I can load data from Kafka and write "as is" to Kafka use KafkaProducer:

    messages.foreachRDD( rdd => {
      rdd.foreachPartition( partition => {
        val kafkaTopic = "output"
        val props = new HashMap[String, Object]()
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
          "org.apache.kafka.common.serialization.StringSerializer")
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
          "org.apache.kafka.common.serialization.StringSerializer")

        val producer = new KafkaProducer[String, String](props)
        partition.foreach{ record: ConsumerRecord[String, String] => {
        System.out.print("########################" + record.value())
        val messageResult = new ProducerRecord[String, String](kafkaTopic, record.value())
        producer.send(messageResult)
        }}
        producer.close()
      })

    })

However, I cannot integrate those two actions > find in json proper value and write findings to Kafka: write PeopleDfFilter in JSON format to "output" Kafka topic.

I have a lot of input messages in Kafka, this is the reason I want to use foreachPartition to create the Kafka producer.

615

asked Nov 23 '17 14:11

Tomtom

1 Answers

The process is very simple so why not use structured streaming all the way?

import org.apache.spark.sql.functions.from_json

spark
  // Read the data
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", inservers) 
  .option("subscribe", intopic)
  .load()
  // Transform / filter
  .select(from_json($"value".cast("string"), schema).alias("value"))
  .filter(...)  // Add the condition
  .select(to_json($"value").alias("value")
  // Write back
  .writeStream
  .format("kafka")
  .option("kafka.bootstrap.servers", outservers)
  .option("subscribe", outtopic)
  .start()

157

answered Oct 18 '22 21:10

user8996943

Related questions
                            
                                Processing (OSM) PBF files in Spark
                            
                                Get name of defining val
                            
                                How can I locate where an implicit comes from in Scala?
                            
                                Benefit of Coproduct over `sealed trait`?
                            
                                Using stat.bloomFilter in Spark 2.0.0 to filter another dataframe
                            
                                How should I write general function take two variable and add them in scala?
                            
                                Scala: For loop that matches ints in a List
                            
                                Why is it possible to instantiate multiple traits in Scala, but not a single one?
                            
                                what's the difference between Array and Buffer when using scala?
                            
                                How to read environment variables in TypeSafe config in scala?
                            
                                Scala/ Spark- Multiply an Integer with each value in a Dataframe Column
                            
                                Retrieve Spark Mllib StringIndexer column mapping
                            
                                How to organize imports in a Scala project?
                            
                                Scala convert IndexedSeq[AnyVal] to Array[Int]
                            
                                Is it the driver or the workers who reads the text file when sc.textfile is used?
                            
                                maximum number of columns we can have in dataframe spark scala
                            
                                class must either be declared abstract or implement convertToLegacyCheckingEqualizer
                            
                                How do I know the type of a scala function
                            
                                How to use Column.isin with array column in join?
                            
                                How to load a CSV file into Apache Arrow vectors and save an arrow file to disk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read json from Kafka and write json to other Kafka topic

Tags:

scala

apache-kafka

apache-spark

spark-streaming

Tomtom

People also ask

1 Answers

user8996943

Recent Activity

Donate For Us