Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error: Must specify a primary resource (JAR or Python or R file) - IPython notebook

I try to run Apache Spark in IPython Notebook, follow this insruction (and all advice in comments) - link

But when I run IPython Notebook by this command:

ipython notebook --profile=pyspark

I get this error:

Error: Must specify a primary resource (JAR or Python or R file)

If i run pyspark in shell, everything OK. That means what I have some trouble with connection Spark and IPython.

By the way, this my bash_profile:

export SPARK_HOME="$HOME/spark-1.4.0"
export PYSPARK_SUBMIT_ARGS='--conf "spark.mesos.coarse=true" pyspark-shell'

And this contain ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py:

# Configure the necessary Spark environment
import os
import sys

# Spark home
spark_home = os.environ.get("SPARK_HOME")

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in  open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))

# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))

And what may be neccesary - yesterday I upgraded my OS X to 10.10.4

like image 265
Gilaztdinov Rustam Avatar asked Jul 02 '15 20:07

Gilaztdinov Rustam


1 Answers

I had a similar problem and I used the same 00-pyspark-setup.py file when used with spark-1.4.0.

As explained by the comments of Philippe Rossignol on this blog, the following lines were added to the 00-pyspark-setup.py file since the argument pyspark-shell is needed for PYSPARK_SUBMIT_ARGS:

# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
    if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

However inside my spark-1.4.0 folder, there was no RELEASE file, so the if condition to append pyspark-shell to PYSPARK_SUBMIT_ARGS was never satisfied.

As a kludgy solution I just commented out the lines checking the release file so only the following lines are left:

pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
like image 198
XValidated Avatar answered Nov 15 '22 09:11

XValidated