SparkSQL sql syntax for nth item in array

Tags:

I have a json object that has an unfortunate combination of nesting and arrays. So its not totally obvious how to query it with spark sql.

here is a sample object:

Click to copy

{
  stuff: [
    {a:1,b:2,c:3}
  ]
}

so, in javascript, to get the value for c, I'd write myData.stuff[0].c

And in my spark sql query, if that array wasn't there, I'd be able to use dot notation:

Click to copy

SELECT stuff.c FROM blah

but I can't, because the innermost object is wrapped in an array.

I've tried:

Click to copy

SELECT stuff.0.c FROM blah // FAIL
SELECT stuff.[0].c FROM blah // FAIL

So, what is the magical way to select that data? or is that even supported yet?

978

asked Jan 21 '16 05:01

Kristian

1 Answers

It is not clear what you mean by JSON object so lets consider two different cases:

An array of structs

Click to copy

import tempfile    

path = tempfile.mktemp()
with open(path, "w") as fw: 
    fw.write('''{"stuff": [{"a": 1, "b": 2, "c": 3}]}''')
df = sqlContext.read.json(path)
df.registerTempTable("df")

df.printSchema()
## root
##  |-- stuff: array (nullable = true)
##  |    |-- element: struct (containsNull = true)
##  |    |    |-- a: long (nullable = true)
##  |    |    |-- b: long (nullable = true)
##  |    |    |-- c: long (nullable = true)

sqlContext.sql("SELECT stuff[0].a FROM df").show()

## +---+
## |_c0|
## +---+
## |  1|
## +---+

An array of maps

Click to copy

# Note: schema inference from dictionaries has been deprecated
# don't use this in practice
df = sc.parallelize([{"stuff": [{"a": 1, "b": 2, "c": 3}]}]).toDF()
df.registerTempTable("df")

df.printSchema()
## root
##  |-- stuff: array (nullable = true)
##  |    |-- element: map (containsNull = true)
##  |    |    |-- key: string
##  |    |    |-- value: long (valueContainsNull = true)

sqlContext.sql("SELECT stuff[0]['a'] FROM df").show()
## +---+
## |_c0|
## +---+
## |  1|
## +---+

See also Querying Spark SQL DataFrame with complex types

answered Oct 01 '22 09:10

zero323

Related questions
                            
                                Can I create list from regular expressions?
                            
                                youtube-dl: setting metadata attributes and embedding thumbnail from python?
                            
                                Merging dictionary keys if values the same
                            
                                Plot a pandas dataframe grouped by column
                            
                                AWS DynamoDB - Load data with Boto3 using JSON file as input
                            
                                TensorFlow MLP not training XOR
                            
                                Handling unassigned (null) values of features in regression (machine learning)?
                            
                                installing pygame on python3.5 osx
                            
                                Python, can someone guess the type of a file only by its base64 encoding?
                            
                                Strip removing more characters than expected
                            
                                Convert dataframe to dictionary [duplicate]
                            
                                Python - get the full file path a function was called from?
                            
                                Newbie Django Model Error
                            
                                Register a "Hello World" DBus service, object and method using Python
                            
                                How can I generate a Toeplitz matrix in the correct form for performing discrete convolution?
                            
                                These Python functions don't have running times as expected
                            
                                How can I check if an element is completely visible on the screen?
                            
                                Append to list in a dictionary after setdefault [duplicate]
                            
                                Nbconvertapp doesn't exist
                            
                                Is it possible to embed and run Python code on HTML?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SparkSQL sql syntax for nth item in array

Tags:

python

apache-spark

apache-spark-sql

pyspark

Kristian

People also ask

1 Answers

zero323

Recent Activity

Donate For Us