Below I have a Scala example of a Spark <code>fold</code> action: <pre class="prettyprint"><code>val rdd1 = sc.parallelize(List(1,2,3,4,5), 3) rdd1.fold(5)(_ + _) </code></pre> This produces the output <code>35</code>. Can somebody explain in detail how this output gets computed?

Taken from the Scaladocs here (emphasis mine): <blockquote> @param zeroValue the initial value for the accumulated result of each partition for the <code>op</code> operator, and also the initial value for the combine results from different partitions for the <code>op</code> operator - this will typically be the neutral element (e.g. <code>Nil</code> for list concatenation or <code>0</code> for summation) </blockquote> The <code>zeroValue</code> is in your case added four times (one for each partition, plus one when combining the results from the partitions). So the result is: <pre class="prettyprint"><code>(5 + 1) + (5 + 2 + 3) + (5 + 4 + 5) + 5 // (extra one for combining results) </code></pre>

How does the fold action work in Spark?

Tags:

scala

apache-spark

fold

Below I have a Scala example of a Spark fold action:

val rdd1 = sc.parallelize(List(1,2,3,4,5), 3)
rdd1.fold(5)(_ + _)

This produces the output 35. Can somebody explain in detail how this output gets computed?

991

asked Jan 20 '18 16:01

thedevd

1 Answers

Taken from the Scaladocs here (emphasis mine):

@param zeroValue the initial value for the accumulated result of each partition for the op operator, and also the initial value for the combine results from different partitions for the op operator - this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)

The zeroValue is in your case added four times (one for each partition, plus one when combining the results from the partitions). So the result is:

(5 + 1) + (5 + 2 + 3) + (5 + 4 + 5) + 5 // (extra one for combining results)

answered Nov 16 '22 03:11

SCouto

Related questions
                            
                                Has Scala any equivalence to Haskell's undefined?
                            
                                Java/Scala obtain a Field reference in a typesafe way
                            
                                Which new features are (or will be) added to Scaladoc in Scala 2.10? [closed]
                            
                                Boundaries between Services, Filters, and Codecs in Finagle
                            
                                Is there any functional language compiler/runtime which optimizes chained iterations?
                            
                                Extract second tuple element in list of tuples
                            
                                i really would like sbt and its console to work under cygwin any way you think it can be done?
                            
                                Why sbt compile doesn't copy unmanaged resources to classpath?
                            
                                Custom Scala enum, most elegant version searched
                            
                                Scala: how to understand the flatMap method of Try?
                            
                                Why does the build fail with unresolved dependency: com.typesafe.sbt#sbt-native-packager;0.7.4?
                            
                                jvm options not passed on to forked process
                            
                                Building Apache Spark using SBT: Invalid or corrupt jarfile
                            
                                Convert Java's Integer to Scala's Int
                            
                                Explanation for - No Reflection involved
                            
                                Is scala sorting stable?
                            
                                MongoDB scala driver: what is a best way to return Future when working with Observer callbacks?
                            
                                Can SparkContext and StreamingContext co-exist in the same program?
                            
                                How to implement breadth first search in Scala with FP
                            
                                How to do count(*) within a spark dataframe groupBy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With