Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splittling list of JSON key/value pairs into columns of a row in a Dataset

I have a column with a list of key/value objects:

+----+--------------------------------------------------------------------------------------------+
|ID  | Settings                                                                                   |
+----+--------------------------------------------------------------------------------------------+
|1   | [{"key":"key1","value":"val1"}, {"key":"key2","value":"val2"}, {"key":"key3","value":"val3"}] |
+----+--------------------------------------------------------------------------------------------+

Is it possible to split this list of objects into its own row? As such:

+----+------+-------+-------+
|ID  | key1 | key2  |  key3 |
+----+------+-------+-------+
|1   | val1 | val2  |  val3 |
+----+------+-------+-------+

I've tried exploding, and placing into a Struct:

   case class Setting(key: String, value: String)
   val newDF = df.withColumn("setting", explode($"settings"))
                .select($"id", from_json($"setting" Encoders.product[Setting].schema) as 'settings)

which gives me:

+------+------------------------------+
|ID    |settings                      |
+------+------------------------------+
|1     |[key1,val1]                   |
|1     |[key2,val2]                   |
|1     |[key3,val3]                   |
+------+------------------------------+

And from here I can use the specifies rows by such settings.key But its not quite what I need. I need to access multiple keys in the one row of data

like image 867
Martin Avatar asked Mar 06 '23 10:03

Martin


1 Answers

You are almost near, If you already got this

+------+------------------------------+
|ID    |settings                      |
+------+------------------------------+
|1     |[key1,val1]                   |
|1     |[key2,val2]                   |
|1     |[key3,val3]                   |
+------+------------------------------+

Now you can use pivot to reshape the data as

newDF.groupBy($"ID")
  .pivot("settings.key")
  .agg(first("settings.value"))

Group by ID and use pivot, Use agg to get the first value but you can use any other function here.

Output:

+---+----+----+----+
|ID |key1|key2|key3|
+---+----+----+----+
|1  |val1|val2|val3|
+---+----+----+----+

Hope this helps!

like image 114
koiralo Avatar answered Mar 09 '23 00:03

koiralo