Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot load main class from JAR file in Spark Submit

I am trying to run a Spark job. This is my shell script, which is located at /home/full/path/to/file/shell/my_shell_script.sh:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        --py-files /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation

When I run this, I get an error that says:

Error: Cannot load main class from JAR file: /home/full/path/to/file/shell/my_function_in_python

My impression here is that it is looking in the wrong place (the python file is located in the python directory, not the shell directory).

like image 527
Katya Willard Avatar asked Dec 10 '15 18:12

Katya Willard


2 Answers

The --py-files flag is for additional python file dependencies used from your program; you can see here in SparkSubmit.scala it uses the so-called "primary argument", meaning first non-flag argument, to determine whether to do a "submit jarfile" mode or "submit python main" mode.

That's why you see it trying to load your "$entry_function" as a jarfile that doesn't exist, since it only assumes you're running Python if that primary argument ends with ".py", and otherwise defaults to assuming you have a .jar file.

Instead of using --py-files, just make your /home/full/path/to/file/python/my_python_file.py be the primary argument; then you can either do fancy python to take the "entry function" as a program argument, or you just call your entry function in your main function inside the python file itself.

Alternatively, you can still use --py-files and then create a new main .py file which calls your entry function, and then pass that main .py file as the primary argument instead.

like image 173
Dennis Huo Avatar answered Sep 28 '22 07:09

Dennis Huo


When adding elements to --py-files use comma to separate them without leaving any space. Try this:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        --py-files /home/full/path/to/file/python/my_python_file.py,$entry_function,$confLocation
like image 28
Tree DR Avatar answered Sep 28 '22 08:09

Tree DR