Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you update a pyfile in the middle of a PySpark shell session?

Within an interactive pyspark session you can import python files via sc.addPyFile('file_location'). If you need to make changes to that file and save them, is there any way to "re-broadcast" the updated file without having to shut down your spark session and start a new one?

Simply adding the file again doesn't work. I'm not sure if renaming the file works, but I don't want to do that anyways.

As far as I can tell from the spark documentation there is only a method to add a pyfile, not update one. I'm hoping that I missed something!

Thanks

like image 446
Jim Avatar asked Mar 02 '17 19:03

Jim


People also ask

How do you use PySpark in spark shell?

Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).

How do I get into the PySpark shell?

In addition, PySpark fully supports interactive use—simply run ./bin/pyspark to launch an interactive shell.

How do you add dependency in PySpark?

Using Virtualenv In the upcoming Apache Spark 3.1, PySpark users can use virtualenv to manage Python dependencies in their clusters by using venv-pack in a similar way as conda-pack. In the case of Apache Spark 3.0 and lower versions, it can be used only with YARN.


1 Answers

I don't think it's feasible during an interactive session. You will have to restart your session to use the modified module.

like image 116
Wen Yao Avatar answered Sep 23 '22 00:09

Wen Yao