I'm a bit confused on how Spark's rdd.take(n) and rdd.takeOrdered(n) works. Can someone explain to me these two methods with some examples? Thanks.
In order to explain how ordering works we create an RDD with integers from 0 to 99:
val myRdd = sc.parallelize(Seq.range(0, 100))
We can now perform:
myRdd.take(5)
Which will extract the first 5 elements of the RDD and we will obtain an Array[Int] containig the first 5 integers of myRDD: '0 1 2 3 4 5' (with no ordering function, just the first 5 elements in the first 5 position)
The takeOrdered(5) operation works in a similar way: it will extract the first 5 elements of the RDD as an Array[Int] but we have to opportunity to specify the ordering criteria:
myRdd.takeOrdered(5)( Ordering[Int].reverse)
Will extract the first 5 elements according to specified ordering. In our case the result will be: '99 98 97 96 95'
If you have a more complex data structure in your RDD you may want to perform your own ordering function with the operation:
myRdd.takeOrdered(5)( Ordering[Int].reverse.on { x => ??? })
Which will extract the first 5 elements of your RDD as an Array[Int] according to your custom ordering function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With