Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query in Spark SQL inside an array

To give the backfround I have loaded the JSON using

sqlContext.read.json(sn3://...)
df.registerTable("posts")

I have the following schema for my table in Spark

scala> posts.printSchema
root
 |-- command: string (nullable = true)
 |-- externalId: string (nullable = true)
 |-- sourceMap: struct (nullable = true)
 |    |-- hashtags: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- url: string (nullable = true)
 |-- type: string (nullable = true)

I want to select all posts with hashtag "nike"

sqlContext.sql("select sourceMap['hashtags'] as ht from posts where ht.contains('nike')");

I get an error undefined function ht.contains

I am not sure what method to use to search within the array.

Thanks!

like image 751
lazywiz Avatar asked Mar 03 '16 23:03

lazywiz


1 Answers

I found the answer referring to Hive SQL.

sqlContext.sql("select sourceMap['hashtags'] from posts where array_contains(sourceMap['hashtags'], 'nike')");

The key function is array_contains()

like image 102
lazywiz Avatar answered Oct 14 '22 00:10

lazywiz