On Spark's RDD's take and takeOrdered methods

Question

I'm a bit confused on how Spark's rdd.take(n) and rdd.takeOrdered(n) works. Can someone explain to me these two methods with some examples? Thanks.

drstein · Accepted Answer

In order to explain how ordering works we create an RDD with integers from 0 to 99:

val myRdd = sc.parallelize(Seq.range(0, 100))

We can now perform:

myRdd.take(5)

Which will extract the first 5 elements of the RDD and we will obtain an Array[Int] containig the first 5 integers of myRDD: '0 1 2 3 4 5' (with no ordering function, just the first 5 elements in the first 5 position)

The takeOrdered(5) operation works in a similar way: it will extract the first 5 elements of the RDD as an Array[Int] but we have to opportunity to specify the ordering criteria:

myRdd.takeOrdered(5)( Ordering[Int].reverse)

Will extract the first 5 elements according to specified ordering. In our case the result will be: '99 98 97 96 95'

If you have a more complex data structure in your RDD you may want to perform your own ordering function with the operation:

myRdd.takeOrdered(5)( Ordering[Int].reverse.on { x => ??? })

Which will extract the first 5 elements of your RDD as an Array[Int] according to your custom ordering function.

On Spark's RDD's take and takeOrdered methods

Tags:

scala

apache-spark

oikonomiyaki

1 Answers

drstein

Recent Activity

Donate For Us

On Spark's RDD's take and takeOrdered methods

Tags:

scala

apache-spark

oikonomiyaki

1 Answers

drstein

Related questions

Recent Activity

Donate For Us