Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rank() function usage in Spark SQL

Need some pointers in using rank()

I have extracted a column from a dataset..need to do the ranking.

Dataset<Row> inputCol= inputDataset.apply("Colname");    
Dataset<Row>  DSColAwithIndex=inputDSAAcolonly.withColumn("df1Rank", rank());

DSColAwithIndex.show();

I can sort the column and then append an index column too to get rank...but curious to known syntax and usage of rank()

like image 336
Binu Avatar asked Mar 06 '17 04:03

Binu


1 Answers

Window spec need to be specified for rank()

val w = org.apache.spark.sql.expressions.Window.orderBy("date") //some spec    

val leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w))

Edit: Java version of answer, as OP using Java

import org.apache.spark.sql.expressions.WindowSpec; 
WindowSpec w = org.apache.spark.sql.expressions.Window.orderBy(colName);
Dataset<Row> leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w));
like image 114
mrsrinivas Avatar answered Oct 16 '22 20:10

mrsrinivas