I need to read some JSON data from a web service thats providing REST interfaces to query the data from my SPARK SQL code for analysis. I am able to read a JSON stored in the blob store and use it.
I was wondering what is the best way to read the data from a REST service and use it like a any other DataFrame
.
BTW I am using SPARK 1.6 of Linux cluster on HD insight
if that helps. Also would appreciate if someone can share any code snippets for the same as I am still very new to SPARK environment.
Spark cannot parse an arbitrary json to dataframe, because json is hierarchical structure and dataframe as flat. If your json is not created by spark, chances are that it does not comply to condition "Each line must contain a separate, self-contained valid JSON object" and hence will need to be parsed using your custom code and then feed to dataframe as collection of case-class objects or spark sql Rows.
You can download like:
import scalaj.http._
val response = Http("proto:///path/to/json")
.header("key", "val").method("get")
.execute().asString.body
and then parse your json as shown in this answer. And then create a Seq of objects of your case-class (say seq) and create a dataframe as
seq.toDF
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With