Create Spark DataFrame in Spark Streaming from JSON Message on Kafka

Question

I'm working on an implementation of Spark Streaming in Scala where I am pull JSON Strings from a Kafka topic and want to load them into a dataframe. Is there a way to do this where Spark infers the schema on it's own from an RDD[String]?

Kiara Grouwstra · Accepted Answer

Yes, you can use the following:

sqlContext.read
//.schema(schema) //optional, makes it a bit faster, if you've processed it before you can get the schema using df.schema
.json(jsonRDD)  //RDD[String]

I'm trying to do the same at the moment. I'm curious how you got the RDD[String] out of Kafka though, I'm still under the impression Spark+Kafka only does streaming rather than "take out what's in there right now" one-off batch. :)

Create Spark DataFrame in Spark Streaming from JSON Message on Kafka

Tags:

dataframe

scala

apache-kafka

apache-spark

masmithd

1 Answers

Kiara Grouwstra

Recent Activity

Donate For Us

Create Spark DataFrame in Spark Streaming from JSON Message on Kafka

Tags:

dataframe

scala

apache-kafka

apache-spark

masmithd

1 Answers

Kiara Grouwstra

Related questions

Recent Activity

Donate For Us