i would like to sort K/V pairs by values and then take the biggest five values. I managed to do this with reverting K/V with first map, sort in descending order with FALSE, and then reverse key.value to the original (second map) and then take the first 5 that are the bigget, the code is this:
RDD.map(lambda x:(x[1],x[0])).sortByKey(False).map(lambda x:(x[1],x[0])).take(5)
i know there is a takeOrdered action on pySpark, but i only managed to sort on values (and not on key), i don't know how to get a descending sorting:
RDD.takeOrdered(5,key = lambda x: x[1])
The Desc method is used to order the elements in descending order. By default the sorting technique used is in Ascending order, so by the use of Desc method, we can sort the element in Descending order in a PySpark Data Frame. The orderBy clause is used to return the row in a sorted Manner.
You can use either sort() or orderBy() function of PySpark DataFrame to sort DataFrame by ascending or descending order based on single or multiple columns, you can also do sorting using PySpark SQL sorting functions, In this article, I will explain all these different ways using PySpark examples.
We can use either orderBy() or sort() method to sort the data in the dataframe. Pass asc() to sort the data in ascending order; otherwise, desc(). We can do this based on a single column or multiple columns.
Sort by keys (ascending):
RDD.takeOrdered(5, key = lambda x: x[0])
Sort by keys (descending):
RDD.takeOrdered(5, key = lambda x: -x[0])
Sort by values (ascending):
RDD.takeOrdered(5, key = lambda x: x[1])
Sort by values (descending):
RDD.takeOrdered(5, key = lambda x: -x[1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With