I have a schema as shown below. How can i parse the nested objects
root |-- apps: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- appName: string (nullable = true) | | |-- appPackage: string (nullable = true) | | |-- Ratings: array (nullable = true) | | | |-- element: struct (containsNull = true) | | | | |-- date: string (nullable = true) | | | | |-- rating: long (nullable = true) |-- id: string (nullable = true)
Convert to DataFrameAdd the JSON string as a collection type and pass it as an input to spark. createDataset. This converts it to a DataFrame. The JSON reader infers the schema automatically from the JSON string.
Assuming you read in a json file and print the schema you are showing us like this:
DataFrame df = sqlContext.read().json("/path/to/file").toDF(); df.registerTempTable("df"); df.printSchema();
Then you can select nested objects inside a struct type like so...
DataFrame app = df.select("app"); app.registerTempTable("app"); app.printSchema(); app.show(); DataFrame appName = app.select("element.appName"); appName.registerTempTable("appName"); appName.printSchema(); appName.show();
Try this:
val nameAndAddress = sqlContext.sql(""" SELECT name, address.city, address.state FROM people """) nameAndAddress.collect.foreach(println)
Source: https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With