Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Operate on neighbor elements in RDD in Spark

As I have a collection:

List(1, 3,-1, 0, 2, -4, 6)

It's easy to make it sorted as:

List(-4, -1, 0, 1, 2, 3, 6)

Then I can construct a new collection by compute 6 - 3, 3 - 2, 2 - 1, 1 - 0, and so on like this:

for(i <- 0 to list.length -2) yield {
    list(i + 1) - list(i)
}

and get a vector:

Vector(3, 1, 1, 1, 1, 3)

That is, I want to make the next element minus the current element.

But how to implement this in RDD on Spark?

I know for the collection:

List(-4, -1, 0, 1, 2, 3, 6)

There will be some partitions of the collection, each partition is ordered, can I do the similar operation on each partition and collect results on each partition together?

like image 825
xring Avatar asked Dec 08 '15 02:12

xring


People also ask

Can we call operations on RDD?

RDD Operations. RDDs support two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset.

How many RDDs can Cogroup () can work at once?

cogroup() can be used for much more than just implementing joins. We can also use it to implement intersect by key. Additionally, cogroup() can work on three or more RDDs at once.

What are RDD operations in Spark?

RDDs are the main logical data units in Spark. They are a distributed collection of objects, which are stored in memory or on disks of different machines of a cluster. A single RDD can be divided into multiple logical partitions so that these partitions can be stored and processed on different machines of a cluster.

How can you remove the elements with a key present in any other RDD?

You can use the subtractByKey () function to remove the elements with a key present in any other RDD.


1 Answers

The most efficient solution is to use sliding method:

import org.apache.spark.mllib.rdd.RDDFunctions._

val rdd = sc.parallelize(Seq(1, 3,-1, 0, 2, -4, 6))
  .sortBy(identity)
  .sliding(2)
  .map{case Array(x, y) => y - x}
like image 151
zero323 Avatar answered Sep 19 '22 01:09

zero323