How to read in-memory JSON string into Spark DataFrame

Tags:

I'm trying to read an in-memory JSON string into a Spark DataFrame on the fly:

var someJSON : String = getJSONSomehow()
val someDF : DataFrame = magic.convert(someJSON)

I've spent quite a bit of time looking at the Spark API, and the best I can find is to use a sqlContext like so:

var someJSON : String = getJSONSomehow()
val tmpFile : Output = Resource
    .fromFile(s"/tmp/json/${UUID.randomUUID().toString()}")
tmpFile.write("hello")(Codec.UTF8)
val someDF : DataFrame = sqlContext.read().json(tmpFile)

But this feels kind of awkward/wonky and imposes the following constraints:

It requires me to format my JSON to one object per line (per documentation); and
It forces me to write the JSON to a temp file, which is slow and awkward; and
It forces me to clean up temp files over time, which is cumbersome and feels "wrong" to me

So I ask: Is there a direct and more efficient way to convert a JSON string into a Spark DataFrame?

653

asked Sep 21 '16 14:09

smeeb

1 Answers

From Spark SQL guide:

val otherPeopleRDD = spark.sparkContext.makeRDD(
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val otherPeople = spark.read.json(otherPeopleRDD)
otherPeople.show()

This creates a DataFrame from an intermediate RDD (created by passing a String).

173

answered Sep 29 '22 12:09

bear911

Related questions
                            
                                How can I create a JSON object that can handle conditionals?
                            
                                mongoengine reference field query
                            
                                Dynamic nested ul\li list from json data using Javascript
                            
                                JSON array Read first element?
                            
                                How to parse JSON with shell scripting on Linux?
                            
                                Create Javascript objects from a template
                            
                                how to rename a field in a JsonNode using jackson API
                            
                                Flask JSON serializable error because of flask babel
                            
                                using Django Rest framework to serialize custom data types and return response
                            
                                JSON Encoding a map in Elixir using Poison
                            
                                Golang server, how to receive TCP JSON packet?
                            
                                HttpURLConnection GET request with http-header "Accept"
                            
                                How to convert YAML to JSON?
                            
                                access laravel app from android app with csrf token
                            
                                Get JSON code from Textarea and parse it
                            
                                WebAPI Return JSON array without root node
                            
                                Google Maps Geocode - Never returns multiple results
                            
                                Return json data with php
                            
                                Typescript - how to correctly use jsPDF in Angular2?
                            
                                How to upload crawled data from Scrapy to Amazon S3 as csv or json?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read in-memory JSON string into Spark DataFrame

Tags:

json

scala

apache-spark

spark-dataframe

smeeb

People also ask

1 Answers

bear911

Recent Activity

Donate For Us