Consider
val animals = List("penguin","ferret","cat").toSeq
val rdd = sc.makeRDD(animals, 1)
I would like to order this RDD. I'm new to Scala and a little confused about how this is to be done.
textFile) the lines of the RDD will be in the order that they were in the file. map, filter, flatMap, and coalesce (with shuffle=false) do preserve the order like most of the RDD operations they work on Iterators inside the partitions. So, they just don't have any choice of messing up the order.
Method 1: Using sortBy() sortBy() is used to sort the data by value efficiently in pyspark. It is a method available in rdd. It uses a lambda expression to sort the data based on columns.
cogroup() can be used for much more than just implementing joins. We can also use it to implement intersect by key. Additionally, cogroup() can work on three or more RDDs at once.
Paired RDD is a distributed collection of data with the key-value pair. It is a subset of Resilient Distributed Dataset So it has all the features of RDD and some new feature for the key-value pair. There are many transformation operations available for Paired RDD.
RDD documentation can be found here. Look at sortBy
:
sortBy[K](
f: (T) ⇒ K,
ascending: Boolean = true,
numPartitions: Int = this.partitions.size
)
The K
is the type of the snippet of the RDD you are sorting by. f
is a function, which you can either define elsewhere with def
and pass it by name or you can create one anonymously in line (which is more scala-like). ascending
and numPartitions
should be self explanatory.
So given all this, try:
rdd.sortBy[String]({animal => animal})
Then try this:
rdd.sortBy[String]({animal => animal}, false)
And then this one, which sorts the RDD by the number of letters "e" in the name of the animal, from most to least:
rdd.sortBy[Int]({a => a.split("").filter(char => char == "e").size}, false)
It should be noted that the original rdd
isn't sorted -- a new, sorted RDD is returned by the operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With