Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Dataframe groupBy and sort results into a list

I have a Spark Dataframe and I would like to group the elements by a key and have the results as a sorted list

Currently I am using:

df.groupBy("columnA").agg(collect_list("columnB"))

How do I make the items in the list sorted ascending order?

like image 763
user2392965 Avatar asked Aug 01 '16 05:08

user2392965


People also ask

What is the difference between orderBy and sort by in Spark?

Description. The SORT BY clause is used to return the result rows sorted within each partition in the user specified order. When there is more than one partition SORT BY may return result that is partially ordered. This is different than ORDER BY clause which guarantees a total order of the output.

How do I get other columns with Spark DataFrame groupBy?

1 Answer. Suppose you have a df that includes columns “name” and “age”, and on these two columns you want to perform groupBY. Now, in order to get other columns also after doing a groupBy you can use join function. Now, data_joined will have all columns including the count values.

How do I sort data in Spark DataFrame?

In Spark, we can use either sort() or orderBy() function of DataFrame/Dataset to sort by ascending or descending order based on single or multiple columns, you can also do sorting using Spark SQL sorting functions like asc_nulls_first(), asc_nulls_last(), desc_nulls_first(), desc_nulls_last().


1 Answers

You could try the function sort_array available in the functions package:

import org.apache.spark.sql.functions._
df.groupBy("columnA").agg(sort_array(collect_list("columnB")))
like image 91
Daniel de Paula Avatar answered Sep 29 '22 15:09

Daniel de Paula