I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending order. Trying to achieve it via this piece of code.
group_by_dataframe.count().filter("`count` >= 10").sort('count', ascending=False)
But it throws the following error.
sort() got an unexpected keyword argument 'ascending'
You can use either sort() or orderBy() function of PySpark DataFrame to sort DataFrame by ascending or descending order based on single or multiple columns, you can also do sorting using PySpark SQL sorting functions, In this article, I will explain all these different ways using PySpark examples.
We will sort the table using the sort() function in which we will access the column using the col() function and desc() function to sort it in descending order.
We can use either orderBy() or sort() method to sort the data in the dataframe. Pass asc() to sort the data in ascending order; otherwise, desc(). We can do this based on a single column or multiple columns.
sort() is more efficient compared to orderBy() because the data is sorted on each partition individually and this is why the order in the output data is not guaranteed. On the other hand, orderBy() collects all the data into a single executor and then sorts them.
In PySpark 1.3 sort
method doesn't take ascending parameter. You can use desc
method instead:
from pyspark.sql.functions import col (group_by_dataframe .count() .filter("`count` >= 10") .sort(col("count").desc()))
or desc
function:
from pyspark.sql.functions import desc (group_by_dataframe .count() .filter("`count` >= 10") .sort(desc("count"))
Both methods can be used with with Spark >= 1.3 (including Spark 2.x).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With