Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reduce, fold or scan (Left/Right)?

People also ask

What is fold left and fold right?

The primary difference is the order in which the fold operation iterates through the collection in question. foldLeft starts on the left side—the first item—and iterates to the right; foldRight starts on the right side—the last item—and iterates to the left.

What is the difference between fold and reduce?

Fold and reduce The difference between the two functions is that fold() takes an initial value and uses it as the accumulated value on the first step, whereas the first step of reduce() uses the first and the second elements as operation arguments on the first step.

What is the difference between reduce and fold in spark?

fold calls fold on an iterator for each partition, then merges the results, reduce calls reduceLeft on the iterator for each partition then merges the result. The difference is that fold doesn't need to worry about empty partitions or collections, because then it will just use the zero value.

How does fold left work?

The foldLeft method takes an associative binary operator function as parameter and will use it to collapse elements from the collection. The order for traversing the elements in the collection is from left to right and hence the name foldLeft. The foldLeft method allows you to also specify an initial value.


In general, all 6 fold functions apply a binary operator to each element of a collection. The result of each step is passed on to the next step (as input to one of the binary operator's two arguments). This way we can cumulate a result.

reduceLeft and reduceRight cumulate a single result.

foldLeft and foldRight cumulate a single result using a start value.

scanLeft and scanRight cumulate a collection of intermediate cumulative results using a start value.

Accumulate

From LEFT and forwards...

With a collection of elements abc and a binary operator add we can explore what the different fold functions do when going forwards from the LEFT element of the collection (from A to C):

val abc = List("A", "B", "C")

def add(res: String, x: String) = { 
  println(s"op: $res + $x = ${res + x}")
  res + x
}

abc.reduceLeft(add)
// op: A + B = AB
// op: AB + C = ABC    // accumulates value AB in *first* operator arg `res`
// res: String = ABC

abc.foldLeft("z")(add) // with start value "z"
// op: z + A = zA      // initial extra operation
// op: zA + B = zAB
// op: zAB + C = zABC
// res: String = zABC

abc.scanLeft("z")(add)
// op: z + A = zA      // same operations as foldLeft above...
// op: zA + B = zAB
// op: zAB + C = zABC
// res: List[String] = List(z, zA, zAB, zABC) // maps intermediate results


From RIGHT and backwards...

If we start with the RIGHT element and go backwards (from C to A) we'll notice that now the second argument to our binary operator accumulates the result (the operator is the same, we just switched the argument names to make their roles clear):

def add(x: String, res: String) = {
  println(s"op: $x + $res = ${x + res}")
  x + res
}

abc.reduceRight(add)
// op: B + C = BC
// op: A + BC = ABC  // accumulates value BC in *second* operator arg `res`
// res: String = ABC

abc.foldRight("z")(add)
// op: C + z = Cz
// op: B + Cz = BCz
// op: A + BCz = ABCz
// res: String = ABCz

abc.scanRight("z")(add)
// op: C + z = Cz
// op: B + Cz = BCz
// op: A + BCz = ABCz
// res: List[String] = List(ABCz, BCz, Cz, z)

.

De-cumulate

From LEFT and forwards...

If instead we were to de-cumulate some result by subtraction starting from the LEFT element of a collection, we would cumulate the result through the first argument res of our binary operator minus:

val xs = List(1, 2, 3, 4)

def minus(res: Int, x: Int) = {
  println(s"op: $res - $x = ${res - x}")
  res - x
}

xs.reduceLeft(minus)
// op: 1 - 2 = -1
// op: -1 - 3 = -4  // de-cumulates value -1 in *first* operator arg `res`
// op: -4 - 4 = -8
// res: Int = -8

xs.foldLeft(0)(minus)
// op: 0 - 1 = -1
// op: -1 - 2 = -3
// op: -3 - 3 = -6
// op: -6 - 4 = -10
// res: Int = -10

xs.scanLeft(0)(minus)
// op: 0 - 1 = -1
// op: -1 - 2 = -3
// op: -3 - 3 = -6
// op: -6 - 4 = -10
// res: List[Int] = List(0, -1, -3, -6, -10)


From RIGHT and backwards...

But look out for the xRight variations now! Remember that the (de-)cumulated value in the xRight variations is passed to the second parameter res of our binary operator minus:

def minus(x: Int, res: Int) = {
  println(s"op: $x - $res = ${x - res}")
  x - res
}

xs.reduceRight(minus)
// op: 3 - 4 = -1
// op: 2 - -1 = 3  // de-cumulates value -1 in *second* operator arg `res`
// op: 1 - 3 = -2
// res: Int = -2

xs.foldRight(0)(minus)
// op: 4 - 0 = 4
// op: 3 - 4 = -1
// op: 2 - -1 = 3
// op: 1 - 3 = -2
// res: Int = -2

xs.scanRight(0)(minus)
// op: 4 - 0 = 4
// op: 3 - 4 = -1
// op: 2 - -1 = 3
// op: 1 - 3 = -2
// res: List[Int] = List(-2, 3, -1, 4, 0) 

The last List(-2, 3, -1, 4, 0) is maybe not what you would intuitively expect!

As you see, you can check what your foldX is doing by simply running a scanX instead and debug the cumulated result at each step.

Bottom line

  • Cumulate a result with reduceLeft or reduceRight.
  • Cumulate a result with foldLeft or foldRight if you have a start value.
  • Cumulate a collection of intermediate results with scanLeft or scanRight.

  • Use a xLeft variation if you want to go forwards through the collection.

  • Use a xRight variation if you want to go backwards through the collection.

Normally REDUCE,FOLD,SCAN method works by accumulating data on LEFT and keep on changing the RIGHT variable. Main difference between them is REDUCE,FOLD is:-

Fold will always start with a seed value i.e. user defined starting value. Reduce will throw a exception if collection is empty where as fold gives back the seed value. Will always result a single value.

Scan is used for some processing order of items from left or right hand side, then we can make use of previous result in subsequent calculation. That means we can scan items. Will always result a collection.

  • LEFT_REDUCE method works similar to REDUCE Method.
  • RIGHT_REDUCE is opposite to reduceLeft one i.e. it accumulates values in RIGHT and keep on changing the left variable.

  • reduceLeftOption and reduceRightOption are similar to left_reduce and right_reduce only difference is they return results in OPTION object.

A part of output for below mentioned code would be :-

using scan operation over a list of numbers (using seed value 0) List(-2,-1,0,1,2)

  • {0,-2}=>-2 {-2,-1}=>-3 {-3,0}=>-3 {-3,1}=>-2 {-2,2}=>0 scan List(0, -2, -3, -3, -2, 0)

  • {0,-2}=>-2 {-2,-1}=>-3 {-3,0}=>-3 {-3,1}=>-2 {-2,2}=>0 scanLeft (a+b) List(0, -2, -3, -3, -2, 0)

  • {0,-2}=>-2 {-2,-1}=>-3 {-3,0}=>-3 {-3,1}=>-2 {-2,2}=>0 scanLeft (b+a) List(0, -2, -3, -3, -2, 0)

  • {2,0}=>2 {1,2}=>3 {0,3}=>3 {-1,3}=>2 {-2,2}=>0 scanRight (a+b) List(0, 2, 3, 3, 2, 0)

  • {2,0}=>2 {1,2}=>3 {0,3}=>3 {-1,3}=>2 {-2,2}=>0 scanRight (b+a) List(0, 2, 3, 3, 2, 0)

using reduce,fold operations over a list of Strings List("A","B","C","D","E")

  • {A,B}=>AB {AB,C}=>ABC {ABC,D}=>ABCD {ABCD,E}=>ABCDE reduce (a+b) ABCDE
  • {A,B}=>AB {AB,C}=>ABC {ABC,D}=>ABCD {ABCD,E}=>ABCDE reduceLeft (a+b) ABCDE
  • {A,B}=>BA {BA,C}=>CBA {CBA,D}=>DCBA {DCBA,E}=>EDCBA reduceLeft (b+a) EDCB
  • {D,E}=>DE {C,DE}=>CDE {B,CDE}=>BCDE {A,BCDE}=>ABCDE reduceRight (a+b) ABCDE
  • {D,E}=>ED {C,ED}=>EDC {B,EDC}=>EDCB {A,EDCB}=>EDCBA reduceRight (b+a) EDCBA

Code :

object ScanFoldReduce extends App {

    val list = List("A","B","C","D","E")
            println("reduce (a+b) "+list.reduce((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  ")
                a+b
            }))

            println("reduceLeft (a+b) "+list.reduceLeft((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  ")
                a+b
            }))

            println("reduceLeft (b+a) "+list.reduceLeft((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  " )
                b+a
            }))

            println("reduceRight (a+b) "+list.reduceRight((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))

            println("reduceRight (b+a) "+list.reduceRight((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  ")
                b+a
            }))

            println("scan            "+list.scan("[")((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))
            println("scanLeft (a+b)  "+list.scanLeft("[")((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))
            println("scanLeft (b+a)  "+list.scanLeft("[")((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  " )
                b+a
            }))
            println("scanRight (a+b) "+list.scanRight("[")((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))
            println("scanRight (b+a) "+list.scanRight("[")((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  " )
                b+a
            }))
//Using numbers
     val list1 = List(-2,-1,0,1,2)

            println("reduce (a+b) "+list1.reduce((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  ")
                a+b
            }))

            println("reduceLeft (a+b) "+list1.reduceLeft((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  ")
                a+b
            }))

            println("reduceLeft (b+a) "+list1.reduceLeft((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  " )
                b+a
            }))

            println("      reduceRight (a+b) "+list1.reduceRight((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))

            println("      reduceRight (b+a) "+list1.reduceRight((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  ")
                b+a
            }))

            println("scan            "+list1.scan(0)((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))

            println("scanLeft (a+b)  "+list1.scanLeft(0)((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b
            }))

            println("scanLeft (b+a)  "+list1.scanLeft(0)((a,b)=>{
                print("{"+a+","+b+"}=>"+ (b+a)+"  " )
                b+a
            }))

            println("scanRight (a+b)         "+list1.scanRight(0)((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                a+b}))

            println("scanRight (b+a)         "+list1.scanRight(0)((a,b)=>{
                print("{"+a+","+b+"}=>"+ (a+b)+"  " )
                b+a}))
}

For the collection x with elements x0, x1, x2, x3 and an arbitrary function f you have the following:

1. x.reduceLeft    (f) is f(f(f(x0,x1),x2),x3) - notice 3 function calls
2. x.reduceRight   (f) is f(f(f(x3,x2),x1),x0) - notice 3 function calls
3. x.foldLeft (init,f) is f(f(f(f(init,x0),x1),x2),x3) - notice 4 function calls
4. x.foldRight(init,f) is f(f(f(f(init,x3),x2),x1),x0) - notice 4 function calls
5. x.scanLeft (init,f) is f(init,x0)=g0
                          f(f(init,x0),x1) = f(g0,x1) = g1
                          f(f(f(init,x0),x1),x2) = f(g1,x2) = g2
                          f(f(f(f(init,x0),x1),x2),x3) = f(g2,x3) = g3
                          - notice 4 function calls but also 4 emitted values
                          - last element is identical with foldLeft
6. x.scanRight (init,f) is f(init,x3)=h0
                          f(f(init,x3),x2) = f(h0,x2) = h1
                          f(f(f(init,x3),x2),x1) = f(h1,x1) = h2
                          f(f(f(f(init,x3),x2),x1),x0) = f(h2,x0) = h3
                          - notice 4 function calls but also 4 emitted values
                          - last element is identical with foldRight

In conclusion

  • scan is like fold but also emits all intermediate values
  • reduce doesn't need an initial value which sometimes is a little harder to find
  • fold needs an initial value that is a little harder to find:
    • 0 for sums
    • 1 for products
    • first element for min (some might suggest Integer.MAX_VALUE)
  • not 100% sure but it looks like there are these equivalent implementations:
    • x.reduceLeft(f) === x.drop(1).foldLeft(x.head,f)
    • x.foldRight(init,f) === x.reverse.foldLeft(init,f)
    • x.foldLeft(init,f) === x.scanLeft(init,f).last