What is the syntax to reverse the ordering for the takeOrdered() method of an RDD in Spark? For bonus points, what is the syntax for custom-ordering for an RDD in Spark?

Reverse Order <pre class="prettyprint"><code>val seq = Seq(3,9,2,3,5,4) val rdd = sc.parallelize(seq,2) rdd.takeOrdered(2)(Ordering[Int].reverse) </code></pre> Result will be Array(9,5) Custom Order We will sort people by age. <pre class="prettyprint"><code>case class Person(name:String, age:Int) val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19)) val rdd = sc.parallelize(people,2) rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age)) </code></pre> Result will be Array(Person(ann,32))

<pre class="prettyprint"><code>val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop"))) val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1)) val rdd3 = rdd2.reduceByKey((x,y) => (x+y)) </code></pre> //Reverse Order (Descending Order) <pre class="prettyprint"><code>rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2)) </code></pre> Output: <pre class="prettyprint"><code>res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2)) </code></pre> //Ascending Order <pre class="prettyprint"><code>rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2)) </code></pre> Output: <pre class="prettyprint"><code>res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5)) </code></pre>

How to reverse ordering for RDD.takeOrdered()?

2 Answers

Reverse Order

val seq = Seq(3,9,2,3,5,4)
val rdd = sc.parallelize(seq,2)
rdd.takeOrdered(2)(Ordering[Int].reverse)

Result will be Array(9,5)

Custom Order

We will sort people by age.

case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
val rdd = sc.parallelize(people,2)
rdd.takeOrdered(1)(Ordering[Int].reverse.on(x=>x.age))

Result will be Array(Person(ann,32))

107

answered Sep 18 '22 17:09

gasparms

val rdd1 = sc.parallelize(List(("Hadoop PIG Hive"), ("Hive PIG PIG Hadoop"), ("Hadoop Hadoop Hadoop")))

val rdd2 = rdd1.flatMap(x => x.split(" ")).map(x => (x,1))

val rdd3 = rdd2.reduceByKey((x,y) => (x+y))

//Reverse Order (Descending Order)

rdd3.takeOrdered(3)(Ordering[Int].reverse.on(x=>x._2))

Output:

res0: Array[(String, Int)] = Array((Hadoop,5), (PIG,3), (Hive,2))

//Ascending Order

rdd3.takeOrdered(3)(Ordering[Int].on(x=>x._2))

Output:

res1: Array[(String, Int)] = Array((Hive,2), (PIG,3), (Hadoop,5))

answered Sep 19 '22 17:09

Prabhat Jain

Related questions
                            
                                Spark: Find Each Partition Size for RDD
                            
                                PySpark: match the values of a DataFrame column against another DataFrame column
                            
                                How to remove duplicate values from a RDD[PYSPARK]
                            
                                How to flatten list inside RDD?
                            
                                SPARK/SQL:spark can't resolve symbol toDF
                            
                                What is apache zeppelin? [closed]
                            
                                How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?
                            
                                Spark 1.6: drop column in DataFrame with escaped column names
                            
                                Spark merge/combine arrays in groupBy/aggregate
                            
                                Spill to disk and shuffle write spark
                            
                                Spark Data frame search column starting with a string
                            
                                how to introduce the schema in a Row in Spark?
                            
                                Spark Twitter Streaming exception : (org.apache.spark.Logging) classnotfound
                            
                                pyspark convert dataframe column from timestamp to string of "YYYY-MM-DD" format
                            
                                Filter based on another RDD in Spark
                            
                                How to make the first row as header when reading a file in PySpark and converting it to Pandas Dataframe
                            
                                Exception in thread "main" java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)
                            
                                SBT assembly jar exclusion
                            
                                How to specify the path where saveAsTable saves files to?
                            
                                terminating a spark step in aws

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to reverse ordering for RDD.takeOrdered()?

Tags:

apache-spark

rdd

StackG

People also ask

2 Answers

gasparms

Prabhat Jain

Recent Activity

Donate For Us