Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: grouping rows in array by key

I have a spark dataset like this one:

key id val1 val2 val3
1   a  a1   a2   a3
2   a  a4   a5   a6
3   b  b1   b2   b3
4   b  b4   b5   b6
5   b  b7   b8   b9
6   c  c1   c2   c3

I would like to group all rows by id in a list or array like this:

(a, ([1   a  a1   a2   a3], [2   a  a4   a5   a6]) ),
(b, ([3   b  b1   b2   b3], [4   b  b4   b5   b6], [5   b  b7   b8   b9]) ),
(c, ([6   c  c1   c2   c3]) )

I have used map to output key/value pairs with the right key but I have troubles in building the final key/array.

Can anybody help with that?

like image 364
Marco Tizzoni Avatar asked Feb 16 '17 11:02

Marco Tizzoni


People also ask

How do I use groupBy key in Spark?

In Spark, the groupByKey function is a frequently used transformation operation that performs shuffling of data. It receives key-value pairs (K, V) as an input, group the values based on key and generates a dataset of (K, Iterable ) pairs as an output.

How do I select the first 10 rows in Spark SQL?

In Spark/PySpark, you can use show() action to get the top/first N (5,10,100 ..)

How do you do a groupBy in Spark?

Similar to SQL “GROUP BY” clause, Spark sql groupBy() function is used to collect the identical data into groups on DataFrame/Dataset and perform aggregate functions like count(),min(),max,avg(),mean() on the grouped data.

What is RelationalGroupedDataset?

RelationalGroupedDataset is an interface to calculate aggregates over groups of rows in a DataFrame. Note. KeyValueGroupedDataset is used for typed aggregates over groups of custom Scala objects (not Rows). RelationalGroupedDataset is a result of executing the following grouping operators: groupBy.


1 Answers

how about this:

import org.apache.spark.sql.functions._
df.withColumn("combined",array("key","id","val1","val2","val3")).groupby("id").agg(collect_list($"combined"))

The Array function converts the columns into an array of column and then its a simple groupby with collect_list

like image 137
Assaf Mendelson Avatar answered Sep 29 '22 01:09

Assaf Mendelson