Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark in yarn-cluser 'sc' not defined

I am using spark 1.3.1.

Do I have to declare sc when spark run in yarn-cluster mode? I have no problem running the same python program in spark python shell.

This is how I submit the job :

/bin/spark-submit --master yarn-cluster test.py --conf conf/spark-defaults.conf

where in spark-defaults I did declare where the spark.yarn.jar is, also check permission on where spark.yarn.jar is and /user/admin, the spark user, to make there is read-write-execute for all.

In my test.py program, I have from pyspark.sql import SQLContext and the first line is

sqlctx=SQLContext(sc)

and the error is

NameError: name 'sc' is not defined

on that line.

Any idea?

like image 307
Tara Avatar asked Jun 05 '15 19:06

Tara


People also ask

How do I submit a Spark job in YARN cluster mode?

You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.

Where does Spark drive in YARN cluster mode?

In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

How do you define SC in PySpark?

In Spark/PySpark 'sc' is a SparkContext object that's created upfront by default on spark-shell/pyspark shell, this object also available in Databricks however when you write PySpark program you need to create SparkSession which internally create SparkContext .


2 Answers

This is what worked for me:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext

conf = SparkConf().setAppName("building a warehouse")
sc = SparkContext(conf=conf)
sqlCtx = SQLContext(sc)

Hope this helps.

like image 104
Tagar Avatar answered Oct 07 '22 23:10

Tagar


sc is a helper value created in the spark-shell, but is not automatically created with spark-submit. You must instantiate your own SparkContext and use that

conf = SparkConf().setAppName(appName)
sc = SparkContext(conf=conf)
like image 44
Justin Pihony Avatar answered Oct 07 '22 21:10

Justin Pihony