Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala : fold vs foldLeft

Tags:

scala

reduce

fold

I am trying to understand how fold and foldLeft and the respective reduce and reduceLeft work. I used fold and foldLeft as my example

scala> val r = List((ArrayBuffer(1, 2, 3, 4),10)) scala> r.foldLeft(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1)  scala> res28: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(5)  scala> r.fold(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1) <console>:11: error: value _1 is not a member of Serializable with Equals               r.fold(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1) 

Why fold didn't work as foldLeft? What is Serializable with Equals? I understand fold and foldLeft has slight different API signature in terms of parameter generic types. Please advise. Thanks.

like image 447
thlim Avatar asked Apr 19 '13 18:04

thlim


People also ask

What is Scala foldLeft?

Advertisements. foldLeft() method is a member of TraversableOnce trait, it is used to collapse elements of collections. It navigates elements from Left to Right order. It is primarily used in recursive functions and prevents stack overflow exceptions.

What is foldLeft and foldRight in Scala?

The primary difference is the order in which the fold operation iterates through the collection in question. foldLeft starts on the left side—the first item—and iterates to the right; foldRight starts on the right side—the last item—and iterates to the left. fold goes in no particular order.

What is foldLeft in spark?

The Scala foldLeft method can be used to iterate over a data structure and perform multiple operations on a Spark DataFrame. foldLeft can be used to eliminate all whitespace in multiple columns or convert all the column names in a DataFrame to snake_case.

How does fold work in Scala?

The fold method takes two sets of arguments. One contains a start value and the other a combining function. It then steps through the list, recursively applying the function to two operands: an accumulated value and the next element in the list. In the first step, the start value, 0, acts as the first operand.


1 Answers

The method fold (originally added for parallel computation) is less powerful than foldLeft in terms of types it can be applied to. Its signature is:

def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1 

This means that the type over which the folding is done has to be a supertype of the collection element type.

def foldLeft[B](z: B)(op: (B, A) => B): B 

The reason is that fold can be implemented in parallel, while foldLeft cannot. This is not only because of the *Left part which implies that foldLeft goes from left to right sequentially, but also because the operator op cannot combine results computed in parallel -- it only defines how to combine the aggregation type B with the element type A, but not how to combine two aggregations of type B. The fold method, in turn, does define this, because the aggregation type A1 has to be a supertype of the element type A, that is A1 >: A. This supertype relationship allows in the same time folding over the aggregation and elements, and combining aggregations -- both with a single operator.

But, this supertype relationship between the aggregation and the element type also means that the aggregation type A1 in your example should be the supertype of (ArrayBuffer[Int], Int). Since the zero element of your aggregation is ArrayBuffer(1, 2, 4, 5) of the type ArrayBuffer[Int], the aggregation type is inferred to be the supertype of both of these -- and that's Serializable with Equals, the only least upper bound of a tuple and an array buffer.

In general, if you want to allow parallel folding for arbitrary types (which is done out of order) you have to use the method aggregate which requires defining how two aggregations are combined. In your case:

r.aggregate(ArrayBuffer(1, 2, 4, 5))({ (x, y) => x -- y._1 }, (x, y) => x intersect y) 

Btw, try writing your example with reduce/reduceLeft -- because of the supertype relationship between the element type and the aggregation type that both these methods have, you will find that it leads to a similar error as the one you've described.

like image 131
axel22 Avatar answered Sep 29 '22 07:09

axel22