I have a column with a list of key/value objects:
+----+--------------------------------------------------------------------------------------------+
|ID | Settings |
+----+--------------------------------------------------------------------------------------------+
|1 | [{"key":"key1","value":"val1"}, {"key":"key2","value":"val2"}, {"key":"key3","value":"val3"}] |
+----+--------------------------------------------------------------------------------------------+
Is it possible to split this list of objects into its own row? As such:
+----+------+-------+-------+
|ID | key1 | key2 | key3 |
+----+------+-------+-------+
|1 | val1 | val2 | val3 |
+----+------+-------+-------+
I've tried exploding, and placing into a Struct:
case class Setting(key: String, value: String)
val newDF = df.withColumn("setting", explode($"settings"))
.select($"id", from_json($"setting" Encoders.product[Setting].schema) as 'settings)
which gives me:
+------+------------------------------+
|ID |settings |
+------+------------------------------+
|1 |[key1,val1] |
|1 |[key2,val2] |
|1 |[key3,val3] |
+------+------------------------------+
And from here I can use the specifies rows by such settings.key But its not quite what I need. I need to access multiple keys in the one row of data
You are almost near, If you already got this
+------+------------------------------+
|ID |settings |
+------+------------------------------+
|1 |[key1,val1] |
|1 |[key2,val2] |
|1 |[key3,val3] |
+------+------------------------------+
Now you can use pivot to reshape the data as
newDF.groupBy($"ID")
.pivot("settings.key")
.agg(first("settings.value"))
Group by ID and use pivot
, Use agg
to get the first
value but you can use any other function
here.
Output:
+---+----+----+----+
|ID |key1|key2|key3|
+---+----+----+----+
|1 |val1|val2|val3|
+---+----+----+----+
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With