Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a registered Spark Accumulator by name

Is there a way of getting a registered Spark Accumulator by name, without passing an actual reference? Desired behavior:

val cnt1 = sc.longAccumulator("cnt1")
val cnt2 = something.getAccumulatorByName("cnt1") asInstanceOf[LongAccumulator]
cnt1.add(1)
cnt2.value // returns 1

Thanks

like image 545
Dima Ogurtsov Avatar asked Jan 18 '18 14:01

Dima Ogurtsov


People also ask

How do I register my accumulator Spark?

The only way to get accumulator by name is to put it into Map. If you need for example to write accumulator in your FileFormat or RelationProvider and then read it in driver, just keep static reference to it.

Where is accumulator located in Spark?

When you create a named accumulator, you can see them on Spark web UI under the “Accumulator” tab. On this tab, you will see two tables; the first table “accumulable” – consists of all named accumulator variables and their values. And on the second table “Tasks” – value for each accumulator modified by a task.

Can we modify accumulator in Spark?

Spark natively supports programmers for new types and accumulators of numeric types. We can also create named or unnamed accumulators, as a user. As similar in below image, In the web UI, it displays a named accumulator. For each accumulator modified by a task in the “Tasks” table Spark displays the value.


1 Answers

Accumulators in Spark are kept in AccumulatorContext and there is no way to get them from it. Spark doesn't allow you to do this because accumulators are not kept until you stop SparkContext. They implemented canonicalizing mappings: accumulators are kept until you have strong reference to it, and as soon as they pass out of scope GC cleans them up (with special finalization process).

The only way to get accumulator by name is to put it into Map. If you need for example to write accumulator in your FileFormat or RelationProvider and then read it in driver, just keep static reference to it. If you read and write accumulators in the same class and you want to get them by name, you most likely need to create custom accumulator with Map[String, Long] inside. It is much more profitable in terms of performance.

like image 60
Avseiytsev Dmitriy Avatar answered Sep 22 '22 00:09

Avseiytsev Dmitriy