Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL JSON dataset query nested datastructures

I have a simple JSON dataset as below. How do I query all parts.lock for the id=1.

JSON:

{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"],
    "parts" : [
        {
            "lock" : "One lock",
            "key" : "single key"
        },
        {
            "lock" : "2 lock",
            "key" : "2 key"
        }
    ]
}

Query:

select id,name,price,parts.lockfrom product where id=1

The point is if I use parts[0].lock it will return one row as below:

{u'price': 12.5, u'id': 1, u'.lock': {u'lock': u'One lock', u'key': u'single key'}, u'name': u'A green door'}

But I want to return all the locks in the parts structure. It will return multiple rows but that's the one I am looking for. This kind of a relational join which I want to accomplish.

Please help me with this

like image 980
Sathish Avatar asked Nov 11 '22 03:11

Sathish


1 Answers

df.select($"id", $"name", $"price", explode($"parts").alias("elem"))
  .where("id = 1")
  .select("id", "name", "price", "elem.lock", "elem.key").show

+---+------------+-----+--------+----------+
| id|        name|price|    lock|       key|
+---+------------+-----+--------+----------+
|  1|A green door| 12.5|One lock|single key|
|  1|A green door| 12.5|  2 lock|     2 key|
+---+------------+-----+--------+----------+
like image 125
Jeremy Avatar answered Nov 13 '22 08:11

Jeremy