Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

takeOrdered descending Pyspark

i would like to sort K/V pairs by values and then take the biggest five values. I managed to do this with reverting K/V with first map, sort in descending order with FALSE, and then reverse key.value to the original (second map) and then take the first 5 that are the bigget, the code is this:

RDD.map(lambda x:(x[1],x[0])).sortByKey(False).map(lambda x:(x[1],x[0])).take(5)

i know there is a takeOrdered action on pySpark, but i only managed to sort on values (and not on key), i don't know how to get a descending sorting:

RDD.takeOrdered(5,key = lambda x: x[1])
like image 539
arj Avatar asked Jun 11 '15 17:06

arj


People also ask

How do you sort descending in PySpark?

The Desc method is used to order the elements in descending order. By default the sorting technique used is in Ascending order, so by the use of Desc method, we can sort the element in Descending order in a PySpark Data Frame. The orderBy clause is used to return the row in a sorted Manner.

How do I sort values in PySpark?

You can use either sort() or orderBy() function of PySpark DataFrame to sort DataFrame by ascending or descending order based on single or multiple columns, you can also do sorting using PySpark SQL sorting functions, In this article, I will explain all these different ways using PySpark examples.

How do you sort a PySpark Dataframe in ascending order?

We can use either orderBy() or sort() method to sort the data in the dataframe. Pass asc() to sort the data in ascending order; otherwise, desc(). We can do this based on a single column or multiple columns.


1 Answers

Sort by keys (ascending):

RDD.takeOrdered(5, key = lambda x: x[0])

Sort by keys (descending):

RDD.takeOrdered(5, key = lambda x: -x[0])

Sort by values (ascending):

RDD.takeOrdered(5, key = lambda x: x[1])

Sort by values (descending):

RDD.takeOrdered(5, key = lambda x: -x[1])
like image 193
aatishk Avatar answered Oct 08 '22 15:10

aatishk