I am reading spark textbooks and I see that transformations and actions and again I read rdd functions , so I am confuse, can anyone explain what is the basic difference between transformations and spark rdd functions.
Both are used to change the rdd data contents and return a new rdd but I want to know the precise explantion.
Spark rdd
functions are transformations
and actions
both. Transformation
is function that changes rdd
data and Action
is a function that doesn't change the data but gives an output.
For example :map
, filter
, union
etc are all transformation
as they help in changing the existing data.
reduce
, collect
, count
are all action
as they give output and not change data.
for more info visit Spark and Jacek
RDDs support only two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset.
RDD Functions
is a generic term used in textbook for internal mechanism.
For example, MAP is a transformation that passes each dataset element through a function and returns a new RDD representing the results. REDUCE is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With