How do we rank dataframe?

Tags:

I have sample dataframe as below :

i/p

accountNumber   assetValue  
A100            1000         
A100            500          
B100            600          
B100            200

o/p

AccountNumber   assetValue  Rank
A100            1000         1
A100            500          2
B100            600          1
B100            200          2

Now my question is how do we add this rank column on dataframe which is sorted by account number. I am not expecting huge volume of rows so open to idea if I need to do it outside of dataframe.

I am using Spark version 1.5 and SQLContext hence cannot use Windows function

769

asked Mar 23 '17 03:03

user3293666

2 Answers

You can use row_number function and Window expression with which you can specify the partition and order columns:

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.row_number

val df = Seq(("A100", 1000), ("A100", 500), ("B100", 600), ("B100", 200)).toDF("accountNumber", "assetValue")

df.withColumn("rank", row_number().over(Window.partitionBy($"accountNumber").orderBy($"assetValue".desc))).show

+-------------+----------+----+
|accountNumber|assetValue|rank|
+-------------+----------+----+
|         A100|      1000|   1|
|         A100|       500|   2|
|         B100|       600|   1|
|         B100|       200|   2|
+-------------+----------+----+

110

answered Oct 25 '22 04:10

Psidom

Raw SQL:

val df = sc.parallelize(Seq(
  ("A100", 1000), ("A100", 500), ("B100", 600), ("B100", 200)
)).toDF("accountNumber", "assetValue")

df.registerTempTable("df")
sqlContext.sql("SELECT accountNumber,assetValue, RANK() OVER (partition by accountNumber ORDER BY assetValue desc) AS rank FROM df").show


+-------------+----------+----+
|accountNumber|assetValue|rank|
+-------------+----------+----+
|         A100|      1000|   1|
|         A100|       500|   2|
|         B100|       600|   1|
|         B100|       200|   2|
+-------------+----------+----+

answered Oct 25 '22 04:10

Nayan Sharma

Related questions
                            
                                Iterate over arbitrary-length tuple
                            
                                Will tuple unpacking be directly supported in parameter lists in Scala?
                            
                                In Scala, how do I pass import statements through to subclasses?
                            
                                Is it possible to use continuations to make foldRight tail recursive?
                            
                                What is meant by 'MyType = Int => Boolean'
                            
                                SocketTimeoutException when I use Scalaj request
                            
                                How do you create scala anonymous function with multiple implicit parameters
                            
                                Thread-safely transforming a value in a mutable map
                            
                                stacking multiple traits in akka Actors
                            
                                List foldRight Always Using foldLeft?
                            
                                Using implicit class to override method
                            
                                Monte Carlo calculation of Pi in Scala
                            
                                object scala in compiler mirror not found - running Scala compiler programmatically
                            
                                Resolving Akka futures from ask in the event of a failure
                            
                                How to install older version of sbt?
                            
                                Read ORC files directly from Spark shell
                            
                                How can I change SparkContext.sparkUser() setting (in pyspark)?
                            
                                Exiting Spark-shell from the scala script
                            
                                Case class and companion object
                            
                                General Questions about Akka and Typesafety

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do we rank dataframe?

Tags:

scala

apache-spark

apache-spark-sql

user3293666

People also ask

2 Answers

Psidom

Nayan Sharma

Recent Activity

Donate For Us