In Spark's RDDs and DStreams we have the 'reduce' function for transforming an entire RDD into one element. However the reduce function takes (T,T) => T However if we want to reduce a List in Scala we can use foldLeft or foldRight which takes type (B)( (B,A) => B) This is very useful because you start folding with a type other then what is in your list.
Is there a way in Spark to do something similar? Where I can start with a value that is of different type then the elements in the RDD itself
The Scala foldLeft method can be used to iterate over a data structure and perform multiple operations on a Spark DataFrame. foldLeft can be used to eliminate all whitespace in multiple columns or convert all the column names in a DataFrame to snake_case.
The primary difference is the order in which the fold operation iterates through the collection in question. foldLeft starts on the left side—the first item—and iterates to the right; foldRight starts on the right side—the last item—and iterates to the left. fold goes in no particular order.
In order for Spark to become a leader in computational speed, it needed to incorporate operational parallelism. Parallelism will ultimately be the reason foldLeft is not found on the RDD class.
foldLeft() method is a member of TraversableOnce trait, it is used to collapse elements of collections. It navigates elements from Left to Right order. It is primarily used in recursive functions and prevents stack overflow exceptions.
Use aggregate instead of reduce. It allows you also to specify a "zero" value of type B and a function like the one you want: (B,A) => B. Do note that you also need to merge separate aggregations done on separate executors, so a (B, B) => B function is also required.
Alternatively, if you want this aggregation as a side effect, an option is to use an accumulator. In particular, the accumulable type allows for the result type to be of a different type than the accumulating type.
Also, if you even need to do the same with a key-value RDD, use aggregateByKey.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With