Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Packaging like jar for pyspark

I have a pyspark project with a python script which runs spark-streaming. I've got some external dependencies which I run with --packages flag.

However, in scala, we can use maven to download all required packages, make a jar file with the main spark program and have everything in one jar and then just use spark-submit to submit it to the cluster (yarn in my case).

Is there any such similar things as jar for pyspark?

There is no such information on the official documentation of spark. They just mention use spark-submit <python-file> or add --py-files but it isn't as professional as a jar file.

Any suggestion would be helpful! Thanks!

like image 725
HackCode Avatar asked Sep 14 '25 22:09

HackCode


1 Answers

The documentation says you can use zip or egg.

For Python applications, simply pass a .py file in the place of instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.

Source

You might also find the other parameters useful.

like image 195
OneCricketeer Avatar answered Sep 16 '25 11:09

OneCricketeer