I have a dataframe with two columns with data as below
+----+-----------------+
|acct| device|
+----+-----------------+
| B| List(3, 4)|
| C| List(3, 5)|
| A| List(2, 6)|
| B|List(3, 11, 4, 9)|
| C| List(5, 6)|
| A|List(2, 10, 7, 6)|
+----+-----------------+
And I need the result as below
+----+-----------------+
|acct| device|
+----+-----------------+
| B|List(3, 4, 11, 9)|
| C| List(3, 5, 6)|
| A|List(2, 6, 7, 10)|
+----+-----------------+
I tried as below but ,it seems to be not working
df.groupBy("acct").agg(concat("device"))
df.groupBy("acct").agg(collect_set("device"))
Please let me know how can I achieve this using Scala?
Using concat() Function to Concatenate DataFrame Columns Spark SQL functions provide concat() to concatenate two or more DataFrame columns into a single Column. It can also take columns of different Data Types and concatenate them into a single column. for example, it supports String, Int, Boolean and also arrays.
1 Answer. Suppose you have a df that includes columns “name” and “age”, and on these two columns you want to perform groupBY. Now, in order to get other columns also after doing a groupBy you can use join function. Now, data_joined will have all columns including the count values.
Similar to SQL “GROUP BY” clause, Spark sql groupBy() function is used to collect the identical data into groups on DataFrame/Dataset and perform aggregate functions like count(),min(),max,avg(),mean() on the grouped data.
Using Join syntax This join syntax takes, takes right dataset, joinExprs and joinType as arguments and we use joinExprs to provide join condition on multiple columns. This example joins emptDF DataFrame with deptDF DataFrame on multiple columns dept_id and branch_id columns using an inner join.
You can start by exploding the device
column and continue as you did - but note that it might not preserve the order of the lists (which anyway isn't guaranteed in any group by):
val result = df.withColumn("device", explode($"device"))
.groupBy("acct")
.agg(collect_set("device"))
result.show(truncate = false)
// +----+-------------------+
// |acct|collect_set(device)|
// +----+-------------------+
// |B |[9, 3, 4, 11] |
// |C |[5, 6, 3] |
// |A |[2, 6, 10, 7] |
// +----+-------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With