Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark RDD sort by two values

I have a RDDof (name:String, popularity:Int, rank:Int). I want to sort this by rank and if rank matches then by popularity. I am doing so by two transformations.

var result = myRDD
        .sortBy(_._2, ascending = false)
        .sortBy(_._3, ascending = false)
        .take(10)

Can I do the it in one transformation?

like image 209
safat siddiqui Avatar asked May 01 '16 05:05

safat siddiqui


1 Answers

You can try make an RDD of key value where key will be Tuple composed from rank and popularity and value will be name and sort by the key.

For example:

// _._1 - name

// _._2 - popularity

// _._3 - rank

var tupledRDD = myRDD.map(line => ((line._3, line._2), line._1))
.sortBy(_._1, ascending=false)
.take(10)
like image 75
Avihoo Mamka Avatar answered Sep 24 '22 12:09

Avihoo Mamka