What is the syntax to reverse the ordering for the takeOrdered() method of an RDD in Spark?
For bonus points, what is the syntax for custom-ordering for an RDD in Spark?
Method 1: Using sortBy() sortBy() is used to sort the data by value efficiently in pyspark. It is a method available in rdd. It uses a lambda expression to sort the data based on columns.
Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling collect or save on the resulting RDD will return or output an ordered list of records (in the save case, they will be written to multiple part-X files in the filesystem, in order of the keys).
Reverse Order
val seq = Seq(3,9,2,3,5,4)
val rdd = sc.parallelize(seq,2)
rdd.takeOrdered(2)(Ordering[Int].reverse)
Result will be Array(9,5)
Custom Order
We will sort people by age.
case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
val rdd = sc.parallelize(people,2)
rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age))
Result will be Array(Person(ann,32))
val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop")))
val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1))
val rdd3 = rdd2.reduceByKey((x,y) => (x+y))
//Reverse Order (Descending Order)
rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2))
Output:
res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2))
//Ascending Order
rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2))
Output:
res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With