i would like to sort K/V pairs by values and then take the biggest five values. I managed to do this with reverting K/V with first map, sort in descending order with FALSE, and then reverse key.value to the original (second map) and then take the first 5 that are the bigget, the code is this: <pre class="prettyprint"><code>RDD.map(lambda x:(x[1],x[0])).sortByKey(False).map(lambda x:(x[1],x[0])).take(5) </code></pre> i know there is a takeOrdered action on pySpark, but i only managed to sort on values (and not on key), i don't know how to get a descending sorting: <pre class="prettyprint"><code>RDD.takeOrdered(5,key = lambda x: x[1]) </code></pre>

Sort by keys (ascending): <pre class="prettyprint"><code>RDD.takeOrdered(5, key = lambda x: x[0]) </code></pre> Sort by keys (descending): <pre class="prettyprint"><code>RDD.takeOrdered(5, key = lambda x: -x[0]) </code></pre> Sort by values (ascending): <pre class="prettyprint"><code>RDD.takeOrdered(5, key = lambda x: x[1]) </code></pre> Sort by values (descending): <pre class="prettyprint"><code>RDD.takeOrdered(5, key = lambda x: -x[1]) </code></pre>

takeOrdered descending Pyspark

Tags:

python

apache-spark

i would like to sort K/V pairs by values and then take the biggest five values. I managed to do this with reverting K/V with first map, sort in descending order with FALSE, and then reverse key.value to the original (second map) and then take the first 5 that are the bigget, the code is this:

RDD.map(lambda x:(x[1],x[0])).sortByKey(False).map(lambda x:(x[1],x[0])).take(5)

i know there is a takeOrdered action on pySpark, but i only managed to sort on values (and not on key), i don't know how to get a descending sorting:

RDD.takeOrdered(5,key = lambda x: x[1])

539

asked Jun 11 '15 17:06

arj

1 Answers

Sort by keys (ascending):

RDD.takeOrdered(5, key = lambda x: x[0])

Sort by keys (descending):

RDD.takeOrdered(5, key = lambda x: -x[0])

Sort by values (ascending):

RDD.takeOrdered(5, key = lambda x: x[1])

Sort by values (descending):

RDD.takeOrdered(5, key = lambda x: -x[1])

193

answered Oct 08 '22 15:10

aatishk

Related questions
                            
                                How to plot a 2d matrix in python with colorbar? (like imagesc in Matlab)
                            
                                Pycharm gets error "can't find '__main__' module"
                            
                                How to synchronize a python dict with multiprocessing
                            
                                argparse module not working in Python
                            
                                How to convert the output of meshgrid to the corresponding array of points?
                            
                                How to show query parameter options in Django REST Framework - Swagger
                            
                                Python merging two lists with all possible permutations
                            
                                Using SQLAlchemy session from Flask raises "SQLite objects created in a thread can only be used in that same thread"
                            
                                How to format seaborn/matplotlib axis tick labels from number to thousands or Millions? (125,436 to 125.4K)
                            
                                Why can I not catch a Queue.Empty exception from a multiprocessing Queue?
                            
                                Getting exception details in Python
                            
                                Python check if list items are integers? [duplicate]
                            
                                Adding y=x to a matplotlib scatter plot if I haven't kept track of all the data points that went in
                            
                                Round down datetime to previous hour
                            
                                Count number of words per row
                            
                                VS Code Python + Black formatter arguments - python.formatting.blackArgs
                            
                                Creating nested dataclass objects in Python
                            
                                Save Numpy Array using Pickle
                            
                                SymPy - Arbitrary number of Symbols
                            
                                Understanding "Too many ancestors" from pylint

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With