I am trying to understand how fold and foldLeft and the respective reduce and reduceLeft work. I used fold and foldLeft as my example
scala> val r = List((ArrayBuffer(1, 2, 3, 4),10)) scala> r.foldLeft(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1) scala> res28: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(5) scala> r.fold(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1) <console>:11: error: value _1 is not a member of Serializable with Equals r.fold(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1)
Why fold
didn't work as foldLeft
? What is Serializable with Equals
? I understand fold and foldLeft has slight different API signature in terms of parameter generic types. Please advise. Thanks.
Advertisements. foldLeft() method is a member of TraversableOnce trait, it is used to collapse elements of collections. It navigates elements from Left to Right order. It is primarily used in recursive functions and prevents stack overflow exceptions.
The primary difference is the order in which the fold operation iterates through the collection in question. foldLeft starts on the left side—the first item—and iterates to the right; foldRight starts on the right side—the last item—and iterates to the left. fold goes in no particular order.
The Scala foldLeft method can be used to iterate over a data structure and perform multiple operations on a Spark DataFrame. foldLeft can be used to eliminate all whitespace in multiple columns or convert all the column names in a DataFrame to snake_case.
The fold method takes two sets of arguments. One contains a start value and the other a combining function. It then steps through the list, recursively applying the function to two operands: an accumulated value and the next element in the list. In the first step, the start value, 0, acts as the first operand.
The method fold
(originally added for parallel computation) is less powerful than foldLeft
in terms of types it can be applied to. Its signature is:
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
This means that the type over which the folding is done has to be a supertype of the collection element type.
def foldLeft[B](z: B)(op: (B, A) => B): B
The reason is that fold
can be implemented in parallel, while foldLeft
cannot. This is not only because of the *Left
part which implies that foldLeft
goes from left to right sequentially, but also because the operator op
cannot combine results computed in parallel -- it only defines how to combine the aggregation type B
with the element type A
, but not how to combine two aggregations of type B
. The fold
method, in turn, does define this, because the aggregation type A1
has to be a supertype of the element type A
, that is A1 >: A
. This supertype relationship allows in the same time folding over the aggregation and elements, and combining aggregations -- both with a single operator.
But, this supertype relationship between the aggregation and the element type also means that the aggregation type A1
in your example should be the supertype of (ArrayBuffer[Int], Int)
. Since the zero element of your aggregation is ArrayBuffer(1, 2, 4, 5)
of the type ArrayBuffer[Int]
, the aggregation type is inferred to be the supertype of both of these -- and that's Serializable with Equals
, the only least upper bound of a tuple and an array buffer.
In general, if you want to allow parallel folding for arbitrary types (which is done out of order) you have to use the method aggregate
which requires defining how two aggregations are combined. In your case:
r.aggregate(ArrayBuffer(1, 2, 4, 5))({ (x, y) => x -- y._1 }, (x, y) => x intersect y)
Btw, try writing your example with reduce
/reduceLeft
-- because of the supertype relationship between the element type and the aggregation type that both these methods have, you will find that it leads to a similar error as the one you've described.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With