I am struggling with the implementation in spark streaming.
The messages from the kafka looks like this but with with more fields
{"event":"sensordata", "source":"sensors", "payload": {"actual data as a json}}
{"event":"databasedata", "mysql":"sensors", "payload": {"actual data as a json}}
{"event":"eventApi", "source":"event1", "payload": {"actual data as a json}}
{"event":"eventapi", "source":"event2", "payload": {"actual data as a json}}
I am trying to read the messages from a Kafka topic (which has multiple schemas). I need to read each message and look for an event and source field and decide where to store as a Dataset. The actual data is in the field payload as a JSON which is only a single record.
Can someone help me to implement this or any other alternatives?
Is it a good way to send the messages with multiple schemas in the same topic and consume it?
Thanks in advance,
You can create a Dataframe
from the incoming JSON object.
Create Seq[Sring]
of JSON object.
Use val df=spark.read.json[Seq[String]]
.
Perform the operations on the dataframe df
of your choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With