Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse nested JSON objects in spark sql?

Tags:

I have a schema as shown below. How can i parse the nested objects

root  |-- apps: array (nullable = true)  |    |-- element: struct (containsNull = true)  |    |    |-- appName: string (nullable = true)  |    |    |-- appPackage: string (nullable = true)  |    |    |-- Ratings: array (nullable = true)  |    |    |    |-- element: struct (containsNull = true)  |    |    |    |    |-- date: string (nullable = true)  |    |    |    |    |-- rating: long (nullable = true)  |-- id: string (nullable = true) 
like image 903
None Avatar asked Apr 29 '15 15:04

None


People also ask

How do you convert nested JSON to DataFrame in Pyspark?

Convert to DataFrameAdd the JSON string as a collection type and pass it as an input to spark. createDataset. This converts it to a DataFrame. The JSON reader infers the schema automatically from the JSON string.


2 Answers

Assuming you read in a json file and print the schema you are showing us like this:

DataFrame df = sqlContext.read().json("/path/to/file").toDF();     df.registerTempTable("df");     df.printSchema(); 

Then you can select nested objects inside a struct type like so...

DataFrame app = df.select("app");         app.registerTempTable("app");         app.printSchema();         app.show(); DataFrame appName = app.select("element.appName");         appName.registerTempTable("appName");         appName.printSchema();         appName.show(); 
like image 189
Vasilis Vagias Avatar answered Sep 17 '22 20:09

Vasilis Vagias


Try this:

val nameAndAddress = sqlContext.sql("""     SELECT name, address.city, address.state     FROM people """) nameAndAddress.collect.foreach(println) 

Source: https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

like image 29
ben jarman Avatar answered Sep 19 '22 20:09

ben jarman