Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to distinguish an operation in spark is a transformation or an action?

Tags:

apache-spark

I'm learning spark recently and confused about the transformation and action operation. I read the spark document and some books about spark, and I know action will cause a spark job to be executed in the cluster while transformation will not. But the operations of rdd listed in spark's api doc are not stated whether it is a transformation or an action operation.

For example, reduce is an action, on the other hand reduceByKey is a transformation! Why could this be.

like image 503
Jun Wang Avatar asked Dec 19 '15 14:12

Jun Wang


People also ask

What is difference between action and transformation in Spark?

Transformations create RDDs from each other, but when we want to work with the actual dataset, at that point action is performed. When the action is triggered after the result, new RDD is not formed like transformation. Thus, Actions are Spark RDD operations that give non-RDD values.

Is Spark read an action or transformation?

It's something that is in the optimization & performance aspect and cannot be seen as Action or Transformation.

What is transformation and action in Apache Spark?

In Spark, the role of transformation is to create a new dataset from an existing one. The transformations are considered lazy as they only computed when an action requires a result to be returned to the driver program. Let's see some of the frequently used RDD Transformations.

What is the difference between a transformation and an action with regards to execution?

Transformations are function that apply to RDDs and produce other RDDs in output (ie: map , flatMap , filter , join , groupBy , ...). Actions are the functions that apply to RDDs and produce non-RDD (Array,List...etc) data as output (ie: count , saveAsText , foreach , collect , ...).


1 Answers

You can tell by looking at the return type. An action will return a non-RDD type (your stored value types usually), whereas a transformation will return an RDD[Type] as it is still just a representation of your computation.

like image 104
Justin Pihony Avatar answered Sep 28 '22 15:09

Justin Pihony