Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reverse ordering for RDD.takeOrdered()?

What is the syntax to reverse the ordering for the takeOrdered() method of an RDD in Spark?

For bonus points, what is the syntax for custom-ordering for an RDD in Spark?

like image 411
StackG Avatar asked Oct 15 '14 16:10

StackG


People also ask

How do you sort RDD by value?

Method 1: Using sortBy() sortBy() is used to sort the data by value efficiently in pyspark. It is a method available in rdd. It uses a lambda expression to sort the data based on columns.

How do I sort in RDD?

Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling collect or save on the resulting RDD will return or output an ordered list of records (in the save case, they will be written to multiple part-X files in the filesystem, in order of the keys).


2 Answers

Reverse Order

val seq = Seq(3,9,2,3,5,4)
val rdd = sc.parallelize(seq,2)
rdd.takeOrdered(2)(Ordering[Int].reverse)

Result will be Array(9,5)

Custom Order

We will sort people by age.

case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
val rdd = sc.parallelize(people,2)
rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age))

Result will be Array(Person(ann,32))

like image 107
gasparms Avatar answered Sep 18 '22 17:09

gasparms


val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop")))

val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1))

val rdd3 = rdd2.reduceByKey((x,y) => (x+y))

//Reverse Order (Descending Order)

rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2))

Output:

res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2))

//Ascending Order

rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2))

Output:

res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5))
like image 44
Prabhat Jain Avatar answered Sep 19 '22 17:09

Prabhat Jain