Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to order by multiple columns in pyspark

I have a data frame:-

Price   sq.ft   constructed
15000   800     22/12/2019
80000   1200    25/12/2019
90000   1400    15/12/2019
70000   1000    10/11/2019
80000   1300    24/12/2019
15000   950     26/12/2019

I want to sort multiple columns at once though I obtained the result I am looking for a better way to do it. Below is my code:-

df.select("*",F.row_number().over(
    Window.partitionBy("Price").orderBy(col("Price").desc(),col("constructed").desc())).alias("Value")).display()
Price   sq.ft   constructed Value
15000   950   26/12/2019    1
15000   800   22/12/2019    2
70000   1000    10/11/2019  1
80000   1200    25/12/2019  1
80000   1300    24/12/2019  2
90000   1400    15/12/2019  1

Rather than repeating col("column name").desc() each time is there any better way to do it? I have also tried the below way:-

df.select("*",F.row_number().over(
    Window.partitionBy("Price").orderBy(["Price","constructed"],ascending = False).alias("Rank"))).display()

getting an error:-

TypeError: orderBy() got an unexpected keyword argument 'ascending'
like image 808
Toi Avatar asked Nov 17 '25 02:11

Toi


1 Answers

You can use a list comprehension:

from pyspark.sql import functions as F, Window

Window.partitionBy("Price").orderBy(*[F.desc(c) for c in ["Price","constructed"]])
like image 98
mck Avatar answered Nov 20 '25 14:11

mck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!