I need a window function that partitions by some keys (=column names), orders by another column name and returns the rows with top x ranks.
This works fine for ascending order:
def getTopX(df: DataFrame, top_x: String, top_key: String, top_value:String): DataFrame ={
val top_keys: List[String] = top_key.split(", ").map(_.trim).toList
val w = Window.partitionBy(top_keys(1),top_keys.drop(1):_*)
.orderBy(top_value)
val rankCondition = "rn < "+top_x.toString
val dfTop = df.withColumn("rn",row_number().over(w))
.where(rankCondition).drop("rn")
return dfTop
}
But when I try to change it to orderBy(desc(top_value))
or orderBy(top_value.desc)
in line 4, I get a syntax error. What's the correct syntax here?
In order to sort by descending order in Spark DataFrame, we can use desc property of the Column class or desc() sql function.
In Spark, we can use either sort() or orderBy() function of DataFrame/Dataset to sort by ascending or descending order based on single or multiple columns, you can also do sorting using Spark SQL sorting functions like asc_nulls_first(), asc_nulls_last(), desc_nulls_first(), desc_nulls_last().
The Desc method is used to order the elements in descending order. By default the sorting technique used is in Ascending order, so by the use of Desc method, we can sort the element in Descending order in a PySpark Data Frame. The orderBy clause is used to return the row in a sorted Manner.
From your dataframe, you may need create an index first. If you want to sort all data based on rows, i would suggest you just to transpose all the data, sorts it, and transpose it back again. You may refer on how to transpose df in pyspark.
There are two versions of orderBy
, one that works with strings and one that works with Column
objects (API). Your code is using the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the desc
method, e.g., myCol.desc
.
Now, we get into API design territory. The advantage of passing Column
parameters is that you have a lot more flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column
, you need to convert the string to a column. There are a number of ways to do this and the easiest is to use org.apache.spark.sql.functions.col(myColName)
.
Putting it all together, we get
.orderBy(org.apache.spark.sql.functions.col(top_value).desc)
Say for example, if we need to order by a column called Date
in descending order in the Window function, use the $
symbol before the column name which will enable us to use the asc
or desc
syntax.
Window.orderBy($"Date".desc)
After specifying the column name in double quotes, give .desc
which will sort in descending order.
Column
col = new Column("ts")
col = col.desc()
WindowSpec w = Window.partitionBy("col1", "col2").orderBy(col)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With