Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL from_json documentation

Where can I find more detailed information regarding the schema parameter of the from_json function in Spark SQL? A coworker gave me a schema example that works, but to be honest, I just don't understand and it doesn't look like any of the examples I have found thus far. The documentation found here seems to be lacking.

like image 924
Michael Blahay Avatar asked May 16 '18 14:05

Michael Blahay


People also ask

How do I read a JSON file from a spark database?

JSON Files. Spark SQL can automatically infer the schema of a JSON dataset and load it as a Dataset[Row]. This conversion can be done using SparkSession.read.json() on either a Dataset[String], or a JSON file. Note that the file that is offered as a json file is not a typical JSON file.

How to convert JSON string to struct type in spark dataframe?

Now by using from_json (Column jsonStringcolumn, StructType schema), you can convert JSON string on the Spark DataFrame column to a struct type. In order to do so, first, you need to create a StructType for the JSON string. import org.apache.spark.sql.types.{

How do I infer the schema of a JSON dataset?

Spark SQL can automatically infer the schema of a JSON dataset and load it as a Dataset [Row]. This conversion can be done using SparkSession.read.json () on either a Dataset [String], or a JSON file. Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object.

How do I write to a JSON file from a Dataframe?

Write Spark DataFrame to JSON file Use the Spark DataFrameWriter object “write” method on DataFrame to write a JSON file. While writing a JSON file you can use several options. Spark DataFrameWriter also has a method mode () to specify SaveMode; the argument to this method either takes below string or a constant from SaveMode class.


Video Answer


1 Answers

In the link you shared the from_json function uses this example:

SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');

Spark SQL supports the vast majority of Hive features such as the defining TYPES

The example problem I was facing required me to parse the following JSON object:

{'data': [
    {
       "id":02938, 
       "price": 2938.0, 
       "quantity": 1
    }, 
    {
       "id":123, 
       "price": 123.5, 
       "quantity": 2
    }
]}

The corresponding Spark SQL query would look like this:

SELECT 
    from_json('{"data":[{"id":123, "quantity":2, "price":39.5}]}'), 
    'data array<struct<id:INT, quantity:INT, price:DOUBLE>>').data) AS product_details;

you can couple this with the explode function to extract each element into it's own column.

I recommend this post to learn more about constructing the types for your query.

Refer to this SO post for more examples https://stackoverflow.com/a/55432107/1500443

like image 127
Buthetleon Avatar answered Oct 20 '22 21:10

Buthetleon