Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the pros and cons of using broadcast variables in a singleton?

As I understand it, broadcast variables are created once, but used many times. So it occurs to me that I should create a broadcast variable inside a singleton class, is this a bad idea? What are the pros and cons of using a singleton class to hold the broadcast variable?

like image 595
moon Avatar asked Jan 06 '16 07:01

moon


People also ask

What is the advantage of broadcast variable?

Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

How broadcast variables improve performance?

Using broadcast variables can improve performance by reducing the amount of network traffic and data serialization required to execute your Spark application.

What is benefit of performing broadcasting in Spark?

Variables of broadcast allow the developers of Spark to keep a secured read-only cached variable on different nodes. With the needed tasks, only shipping a copy merely. Without having to waste a lot of time and transfer of network input and output, they can be used in giving a node a large copy of the input dataset.

Can we modify broadcast variable?

Secondly, broadcast variables area cannot be changed, which means that they can't be modified. If you want to change or modify, accumulators are needed.


1 Answers

I think Broadcast variables deserve some explanations in order to better understand how this mechanism works:

A Broadcast variable allows to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. It can be used, for example, to give every node a copy of a large input dataset in an efficient manner.
You can check further details at: Spark Broadcast variables

After you create a broadcast variable in Spark, you get a wrapper around v (accessed by calling broadcastVar.value()), so using a singleton will be using a singleton reference for the wrapper, rather than the actual value and so, the singleton pattern may have much less impact either way.
You could also use the same wrapper reference throughout the entire life-cycle of your Spark execution.

As this is a reference to a read-only variable, those options will present a similar behavior.

like image 189
Leet-Falcon Avatar answered Oct 10 '22 04:10

Leet-Falcon