Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Spark DataFrame in Spark Streaming from JSON Message on Kafka

I'm working on an implementation of Spark Streaming in Scala where I am pull JSON Strings from a Kafka topic and want to load them into a dataframe. Is there a way to do this where Spark infers the schema on it's own from an RDD[String]?

like image 595
masmithd Avatar asked Jun 26 '15 14:06

masmithd


1 Answers

Yes, you can use the following:

sqlContext.read
//.schema(schema) //optional, makes it a bit faster, if you've processed it before you can get the schema using df.schema
.json(jsonRDD)  //RDD[String]

I'm trying to do the same at the moment. I'm curious how you got the RDD[String] out of Kafka though, I'm still under the impression Spark+Kafka only does streaming rather than "take out what's in there right now" one-off batch. :)

like image 96
Kiara Grouwstra Avatar answered Sep 23 '22 05:09

Kiara Grouwstra