Reading at this and this makes me think it is possible to have a python file be executed by spark-submit
however I couldn't get it to work.
My setup is a bit complicated. I require several different jars to be submitted together with my python files in order for everything to function. My pyspark
command which works is the following:
IPYTHON=1 ./pyspark --jars jar1.jar,/home/local/ANT/bogoyche/dev/rhine_workspace/env/Scala210-1.0/runtime/Scala2.10/scala-library.jar,jar2.jar --driver-class-path jar1.jar:jar2.jar
from sys import path
path.append('my-module')
from my-module import myfn
myfn(myargs)
I have packaged my python files inside an egg, and the egg contains the main file, which makes the egg executable by calling python myegg.egg
I am trying now to form my spark-submit
command and I can't seem to get it right. Here's where I am:
./spark-submit --jars jar1.jar,jar2.jar --py-files path/to/my/egg.egg arg1 arg
Error: Cannot load main class from JAR file:/path/to/pyspark/directory/arg1
Run with --help for usage help or --verbose for debug output
Instead of executing my .egg file, it is taking the first argument of the egg and considers it a jar file and tries to load a class from it? What am I doing wrong?
One way is to have a main driver program for your Spark application as a python file (.py) that gets passed to spark-submit. This primary script has the main method to help the Driver identify the entry point. This file will customize configuration properties as well initialize the SparkContext.
The ones bundled in the egg executables are dependencies that are shipped to the executor nodes and imported inside the driver program.
You can script a small file as main driver and execute -
./spark-submit --jars jar1.jar,jar2.jar --py-files path/to/my/egg.egg driver.py arg1 arg
The driver program would be something like -
from pyspark import SparkContext, SparkConf
from my-module import myfn
if __name__ == '__main__':
conf = SparkConf().setAppName("app")
sc = SparkContext(conf=conf)
myfn(myargs, sc)
Pass the SparkContext
object as arguments wherever necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With