Spark / Scala: Passing RDD to Function

Question

I am curious what exactly passing a RDD to a function does in Spark.

def my_func(x : RDD[String]) : RDD[String] = {
  do_something_here
}

Suppose we define a function as above. When we call the function and pass an existing RDD[String] object as the input parameter, does this my_function make a "copy" for this RDD as the function parameter? In other words, is it being called-by-reference or called-by-value?

marios · Accepted Answer

In Scala nothing get's copied (in the sense of pass-by-value you have in C/C++) when passed around. Most of the basic types Int, String, Double, etc. are immutable, so passing them by reference is very safe. (Note: If you are passing a mutable object and you change it, then anyone with a reference to that object will see the change).

On top of that, RDDs are lazy, distributed, immutable collections. Passing RDDs through functions and applying transformation to them (map, filter, etc.) doesn't really transfer any data or triggers any computation.

All chained transformations are "remembered" and will automatically get triggered in the right order when you enforce and action on the RDD, such as persisting it, or collecting it locally at the driver (through collect(), take(n), etc.)

Spark / Scala: Passing RDD to Function

Tags:

scala

apache-spark

rdd

Jes

1 Answers

marios

Recent Activity

Donate For Us

Spark / Scala: Passing RDD to Function

Tags:

scala

apache-spark

rdd

Jes

1 Answers

marios

Related questions

Recent Activity

Donate For Us