Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is difference between transformations and rdd functions in spark?

I am reading spark textbooks and I see that transformations and actions and again I read rdd functions , so I am confuse, can anyone explain what is the basic difference between transformations and spark rdd functions.

Both are used to change the rdd data contents and return a new rdd but I want to know the precise explantion.

like image 517
j pavan kumar Avatar asked Mar 09 '23 03:03

j pavan kumar


2 Answers

Spark rdd functions are transformations and actions both. Transformation is function that changes rdd data and Action is a function that doesn't change the data but gives an output.
For example :
map, filter, union etc are all transformation as they help in changing the existing data. reduce, collect, count are all action as they give output and not change data. for more info visit Spark and Jacek

like image 169
Ramesh Maharjan Avatar answered Mar 11 '23 17:03

Ramesh Maharjan


RDDs support only two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset.

RDD Functions is a generic term used in textbook for internal mechanism.

For example, MAP is a transformation that passes each dataset element through a function and returns a new RDD representing the results. REDUCE is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program.

like image 41
Manish Saraf Bhardwaj Avatar answered Mar 11 '23 18:03

Manish Saraf Bhardwaj