Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to broadcast large variable to local disk of each node in Spark

As i know, broadcast is useful to get local copy of a variable. And the size of the variable must fit in worker's memory.

In my case, However, I want to get local copy of large variable which is not fit in worker's memory.

How can i broadcast this large variable not using broadcast function in Spark?

like image 986
S. Jun Avatar asked Nov 26 '25 23:11

S. Jun


1 Answers

large variable which is not fit in worker's memory

Like Ram mentioned above, if it doesn't fit in worker's memory, there is no way you can use it, even if you can broadcast it.

If you're trying to do lookup with large dataset, you can make a connection pool to a database at each worker node. If you have a model, you can save the model to each worker node and do a file read during foreachPartition. Depending on your use case, there maybe other solutions.

like image 159
moon Avatar answered Nov 29 '25 12:11

moon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!