Spark SQL from_json documentation

Tags:

apache-spark-sql

Where can I find more detailed information regarding the schema parameter of the from_json function in Spark SQL? A coworker gave me a schema example that works, but to be honest, I just don't understand and it doesn't look like any of the examples I have found thus far. The documentation found here seems to be lacking.

924

asked May 16 '18 14:05

Michael Blahay

Video Answer

1 Answers

In the link you shared the from_json function uses this example:

SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');

Spark SQL supports the vast majority of Hive features such as the defining TYPES

The example problem I was facing required me to parse the following JSON object:

{'data': [
    {
       "id":02938, 
       "price": 2938.0, 
       "quantity": 1
    }, 
    {
       "id":123, 
       "price": 123.5, 
       "quantity": 2
    }
]}

The corresponding Spark SQL query would look like this:

SELECT 
    from_json('{"data":[{"id":123, "quantity":2, "price":39.5}]}'), 
    'data array<struct<id:INT, quantity:INT, price:DOUBLE>>').data) AS product_details;

you can couple this with the explode function to extract each element into it's own column.

I recommend this post to learn more about constructing the types for your query.

Refer to this SO post for more examples https://stackoverflow.com/a/55432107/1500443

127

answered Oct 20 '22 21:10

Buthetleon

Related questions
                            
                                Spark pulling data into RDD or dataframe or dataset
                            
                                Is there any way to get the output of Spark's Dataset.show() method as a string?
                            
                                UDF cause warning: CachedKafkaConsumer is not running in UninterruptibleThread (KAFKA-1894)
                            
                                Does Spark support BigInteger type?
                            
                                Spark: Prevent shuffle/exchange when joining two identically partitioned dataframes
                            
                                How to set hive.metastore.warehouse.dir in HiveContext?
                            
                                Spark Truncated Spark Plan
                            
                                Spark createDataFrame(df.rdd, df.schema) vs checkPoint for breaking lineage
                            
                                SparkSQL MissingRequirementError when registering table
                            
                                Spark Exception : Task failed while writing rows
                            
                                Hive Sql dynamically get null column counts from a table
                            
                                Reading JSON files into Spark Dataset and adding columns from a separate Map
                            
                                Spark 2.0 Timestamp Difference in Milliseconds using Scala
                            
                                my spark sql limit is very slow
                            
                                Spark read parquet with custom schema
                            
                                Spark SQL convert dataset to dataframe
                            
                                Not able to connect to postgres using jdbc in pyspark shell
                            
                                SparkSQL, Thrift Server and Tableau
                            
                                Saving/Exporting the results of a Spark SQL Zeppelin query
                            
                                How to add empty map type column to DataFrame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With