How to use orderby() with descending order in Spark window functions?

Tags:

I need a window function that partitions by some keys (=column names), orders by another column name and returns the rows with top x ranks.

This works fine for ascending order:

def getTopX(df: DataFrame, top_x: String, top_key: String, top_value:String): DataFrame ={
    val top_keys: List[String] = top_key.split(", ").map(_.trim).toList
    val w = Window.partitionBy(top_keys(1),top_keys.drop(1):_*)
       .orderBy(top_value)
    val rankCondition = "rn < "+top_x.toString
    val dfTop = df.withColumn("rn",row_number().over(w))
      .where(rankCondition).drop("rn")
  return dfTop
}

But when I try to change it to orderBy(desc(top_value)) or orderBy(top_value.desc) in line 4, I get a syntax error. What's the correct syntax here?

651

asked Jul 25 '16 16:07

Malte

3 Answers

There are two versions of orderBy, one that works with strings and one that works with Column objects (API). Your code is using the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the desc method, e.g., myCol.desc.

Now, we get into API design territory. The advantage of passing Column parameters is that you have a lot more flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column, you need to convert the string to a column. There are a number of ways to do this and the easiest is to use org.apache.spark.sql.functions.col(myColName).

Putting it all together, we get

.orderBy(org.apache.spark.sql.functions.col(top_value).desc)

184

answered Oct 14 '22 05:10

Sim

Say for example, if we need to order by a column called Date in descending order in the Window function, use the $ symbol before the column name which will enable us to use the asc or desc syntax.

Window.orderBy($"Date".desc)

After specifying the column name in double quotes, give .desc which will sort in descending order.

answered Oct 14 '22 04:10

Sarath KS

Column

col = new Column("ts")
col = col.desc()
WindowSpec w = Window.partitionBy("col1", "col2").orderBy(col)

answered Oct 14 '22 03:10

GPopat

Related questions
                            
                                How do I run an sbt main class from the shell as normal command-line program?
                            
                                Play Framework 2.4 Writes[-A] vs OWrites[-A], Format[A] vs OFormat[A]. Purpose?
                            
                                Is passing around ActorRef to other Actors good or bad ?
                            
                                Convert Scala Set into Java (java.util.Set)?
                            
                                Easiest way to decide if List contains duplicates?
                            
                                Typedef in Scala
                            
                                Implementing yield (yield return) using Scala continuations
                            
                                Scala getClass.getResource() returning null
                            
                                No configuration setting found for key typesafe config
                            
                                Why does sbt build fail with "MissingRequirementError: object scala.runtime in compiler mirror not found."?
                            
                                Type inference fails on Set made with .toSet?
                            
                                Mockito matchers, scala value class and NullPointerException
                            
                                How to check that an array contains a particular value in Scala 2.8?
                            
                                scala anonymous function missing parameter type error
                            
                                How to create a custom 404 page handler with Play 2.0?
                            
                                Map versus FlatMap on String
                            
                                Retrieve SparkContext from SparkSession
                            
                                Why doesn't the Scala List have a size field?
                            
                                Cast Option[Any] to int
                            
                                Why to use empty parentheses in Scala if we can just use no parentheses to define a function which does not need any arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use orderby() with descending order in Spark window functions?

Tags:

scala

apache-spark

apache-spark-sql

spark-dataframe

Malte

People also ask

3 Answers

Sim

Sarath KS

GPopat

Recent Activity

Donate For Us