To give the backfround I have loaded the JSON using
sqlContext.read.json(sn3://...)
df.registerTable("posts")
I have the following schema for my table in Spark
scala> posts.printSchema
root
|-- command: string (nullable = true)
|-- externalId: string (nullable = true)
|-- sourceMap: struct (nullable = true)
| |-- hashtags: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- url: string (nullable = true)
|-- type: string (nullable = true)
I want to select all posts with hashtag "nike"
sqlContext.sql("select sourceMap['hashtags'] as ht from posts where ht.contains('nike')");
I get an error undefined function ht.contains
I am not sure what method to use to search within the array.
Thanks!
I found the answer referring to Hive SQL.
sqlContext.sql("select sourceMap['hashtags'] from posts where array_contains(sourceMap['hashtags'], 'nike')");
The key function is array_contains()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With