How to sort an RDD of tuples with 5 elements in Spark Scala?

Tags:

If I have an RDD of tuples with 5 elements, e.g., RDD(Double, String, Int, Double, Double)

How can I sort this RDD efficiently using the fifth element?

I tried to map this RDD into key-value pairs and used sortByKey, but looks like sortByKey is quite slow, it is slower than I collected this RDD and used sortWith on the collected array. Why is it like this?

Thank you very much.

665

asked Oct 13 '15 07:10

Carter

2 Answers

You can do this with sortBy acting directly on the RDD:

myRdd.sortBy(_._5) // Sort by 5th field of each 5-tuple

There are extra optional parameters to define sort order ("ascending") and number of partitions.

198

answered Oct 21 '22 04:10

Shadowlands

If you want to sort by descending order & if the corresponding element is of type int, you can use "-" sign to sort the RDD in descending order.

For ex:

I've a RDD of tuple with (String, Int). To sort this RDD by its 2nd element in descending order,

rdd.sortBy(x => -x._2).collect().foreach(println);

I've a RDD of tuple with (String, String). To sort this RDD by its 2nd element in descending order,

rdd.sortBy(x => x._2, false).collect().foreach(println);

answered Oct 21 '22 03:10

Sivakumar

Related questions
                            
                                I get "not a valid key: gen-idea", with a clean install of sbt 0.13+ how to fix it?
                            
                                Play 2.2 JSON Reads with combinators: how to deal with nested optional objects?
                            
                                Transform shapeless HList into a smaller HList
                            
                                Scala source code metrics tool (lines of code, lines of comments and so on) [closed]
                            
                                How to print dependency jars for use in an environment variable?
                            
                                Martin Odersky : Working hard to keep it simple
                            
                                spark on yarn; how to send metrics to graphite sink?
                            
                                Convert java.lang.String to Scala string
                            
                                Scala check if string in option is defined and empty
                            
                                What scala version Intellij Idea scala plugin uses?
                            
                                Play: How to remove the fields without value from JSON and create a new JSON with them
                            
                                Why does Scala's indexOf (in List etc) return Int instead of Option[Int]?
                            
                                Why does Scala need the def statement?
                            
                                Scala Syntactic Sugar for converting to `Option`
                            
                                scala - How does method :: works in List?
                            
                                How can I select a non-sequential subset elements from an array using Scala and Spark?
                            
                                How to fix type inference error in this fold example?
                            
                                Why won't my scalatest test compile? (scala.MatchError)
                            
                                Can anyone share a Flink Kafka example in Scala?
                            
                                IntelliJ Idea 14: cannot resolve symbol spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to sort an RDD of tuples with 5 elements in Spark Scala?

Tags:

sorting

scala

apache-spark

rdd

Carter

People also ask

2 Answers

Shadowlands

Sivakumar

Recent Activity

Donate For Us