Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Streamming : Reading data from kafka that has multiple schema

I am struggling with the implementation in spark streaming.

The messages from the kafka looks like this but with with more fields

{"event":"sensordata", "source":"sensors", "payload": {"actual data as a json}}
{"event":"databasedata", "mysql":"sensors", "payload": {"actual data as a json}}
{"event":"eventApi", "source":"event1", "payload": {"actual data as a json}}
{"event":"eventapi", "source":"event2", "payload": {"actual data as a json}}

I am trying to read the messages from a Kafka topic (which has multiple schemas). I need to read each message and look for an event and source field and decide where to store as a Dataset. The actual data is in the field payload as a JSON which is only a single record.

Can someone help me to implement this or any other alternatives?

Is it a good way to send the messages with multiple schemas in the same topic and consume it?

Thanks in advance,

like image 233
koiralo Avatar asked Nov 08 '22 15:11

koiralo


1 Answers

You can create a Dataframe from the incoming JSON object.

Create Seq[Sring] of JSON object.

Use val df=spark.read.json[Seq[String]].

Perform the operations on the dataframe df of your choice.

like image 79
SLU Avatar answered Nov 15 '22 11:11

SLU