Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On Spark's RDD's take and takeOrdered methods

I'm a bit confused on how Spark's rdd.take(n) and rdd.takeOrdered(n) works. Can someone explain to me these two methods with some examples? Thanks.

like image 910
oikonomiyaki Avatar asked Nov 06 '15 09:11

oikonomiyaki


1 Answers

In order to explain how ordering works we create an RDD with integers from 0 to 99:

val myRdd = sc.parallelize(Seq.range(0, 100))

We can now perform:

myRdd.take(5)

Which will extract the first 5 elements of the RDD and we will obtain an Array[Int] containig the first 5 integers of myRDD: '0 1 2 3 4 5' (with no ordering function, just the first 5 elements in the first 5 position)

The takeOrdered(5) operation works in a similar way: it will extract the first 5 elements of the RDD as an Array[Int] but we have to opportunity to specify the ordering criteria:

myRdd.takeOrdered(5)( Ordering[Int].reverse)

Will extract the first 5 elements according to specified ordering. In our case the result will be: '99 98 97 96 95'

If you have a more complex data structure in your RDD you may want to perform your own ordering function with the operation:

myRdd.takeOrdered(5)( Ordering[Int].reverse.on { x => ??? })

Which will extract the first 5 elements of your RDD as an Array[Int] according to your custom ordering function.

like image 68
drstein Avatar answered Sep 20 '22 22:09

drstein