I am practicing Apache Spark but encountering the following problem. <pre class="prettyprint"><code>val accum = sc.accumulator( 0, "My Accumulator.") println (accum) // print out: 0 sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x ) // sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum = accum + x ) println( accum.value ) // print out: 15 </code></pre> This line of code <code>sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x )</code> is working quite well, but the code commented out below it is not working. The difference is: <pre class="prettyprint"><code>x => accum += x </code></pre> and <pre class="prettyprint"><code>x => accum = accum + x </code></pre> Why the second one is not working?

There are three reasons why it doesn't work: <ol> <li> <code>accum</code> is a value so it cannot be reassigned </li> <li> <code>Accumulable</code> class, which is a base class for <code>Accumulator</code> provides only <code>+=</code> method, not <code>+</code> </li> <li>accumulators are write-only from the worker perspective so you cannot read the value inside an action. Theoretically <code>+</code> method could modify <code>accum</code> in place, but it would be rather confusing. </li> </ol>

Because believing it or not, an <code>Accumulator</code> in <code>Apache Spark</code> works like a write-only global variable, in our imperative thinking we don't see any difference between <code>x += 1</code> and <code>x = x + 1</code>, but there is a slight difference, the second operation in <code>Apache Spark</code> would require to read the value, but the first one wouldn't, or in an easier (how zero said in his explanation) the method <code>+</code> isn't implemented for that class. Apache Spark on p. 41, you can read about how it works, the slides are extracted from the Introduction to Big Data with Apache Spark

accumulator of Spark is confusing me.

Tags:

scala

apache-spark

I am practicing Apache Spark but encountering the following problem.

val accum = sc.accumulator( 0, "My Accumulator.")
println (accum)  // print out: 0

sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x ) 
// sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum = accum + x )
println( accum.value ) // print out: 15

This line of code sc.parallelize( Array(1, 2, 3, 4, 5) ).foreach( x => accum += x ) is working quite well, but the code commented out below it is not working. The difference is:

x => accum += x

and

x => accum = accum + x

Why the second one is not working?

434

asked Dec 04 '15 17:12

fluency03

2 Answers

There are three reasons why it doesn't work:

accum is a value so it cannot be reassigned
Accumulable class, which is a base class for Accumulator provides only += method, not +
accumulators are write-only from the worker perspective so you cannot read the value inside an action. Theoretically + method could modify accum in place, but it would be rather confusing.

149

answered Nov 15 '22 05:11

zero323

Because believing it or not, an Accumulator in Apache Spark works like a write-only global variable, in our imperative thinking we don't see any difference between x += 1 and x = x + 1, but there is a slight difference, the second operation in Apache Spark would require to read the value, but the first one wouldn't, or in an easier (how zero said in his explanation) the method + isn't implemented for that class. Apache Spark on p. 41, you can read about how it works, the slides are extracted from the Introduction to Big Data with Apache Spark

answered Nov 15 '22 05:11

Alberto Bonsanto

Related questions
                            
                                Is this a pure function in Scala
                            
                                Designing with immutability (in Scala)
                            
                                Higher Kinded Types in Scala and Haskell
                            
                                Unresolved dependency with specs2 scalaz-stream 0.5a
                            
                                Parboiled2 causes "missing or invalid dependency detected while loading class file 'Prepender.class'"
                            
                                Using `title` with ScalaTags
                            
                                json4s jackson - How to ignore field using annotations
                            
                                AWS S3: Uploading large file fails with ResetException: Failed to reset the request input stream
                            
                                Convert Matrix to RowMatrix in Apache Spark using Scala
                            
                                How to find the name of the enclosing source file in Scala 2.11
                            
                                Difference between Scala REPL and Clojure REPL - compile speed
                            
                                Akka scheduling patterns
                            
                                How do I compose a list of `Futures`?
                            
                                Caused by: java.sql.SQLException: JDBC4 Connection.isValid() method not supported
                            
                                Using v. Not Using the `self` Type
                            
                                Parsing a Json String in Scala using Play framework
                            
                                Akka cluster detecting Quarantined state
                            
                                Understanding closures and parallelism in Spark
                            
                                Idiomatic way to use Spark DStream as Source for an Akka stream
                            
                                Scala compiler optimization for immutability

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With