Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submitting spark app as a yarn job from Eclipse and Spark Context

I can already submit local spark jobs (written in Scala) from my Eclipse IDE. However, I would like to modify my Spark context (inside my application) so that when I 'Run' the app (inside Eclipse), the job will be sent to my remote cluster using Yarn as a resource manager.

Using spark-submit, I can successfully submit the job to the cluster as: spark-submit --class <main class> --master yarn-cluster <jar>

I want to achieve the same result inside the IDE. My sbt config (app root directory) looks like: libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1" libraryDependencies += "org.apache.spark" %% "spark-yarn" % "1.5.1" % "provided" Inside my app: val conf = new SparkConf().setAppName("xxx").setMaster("yarn-cluster") However, I am getting the following errors:

Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.

like image 454
Neel Avatar asked Jan 27 '16 05:01

Neel


1 Answers

1) According to research I have conducted you cannot use yarn-cluster as a master in your code when submitting remotely from Eclipse, use spark-client instead.

new SparkConf().setAppName("test-app").setMaster("yarn-client");

Check this Cloudera resource, they are shredding some light on what might be the constraint preventing you from running you "interactive" application in cluster mode.

2) You might run into the problem with resourced not being properly copied to the cluster. What solved the problem in my case, was including the following files in the classpath of the project (without any fanciness, for now I just copied them into src/java directory of the project):

  • core-site.xml
  • hdfs-site.xml
  • yarn-site.xml

Ensure that especially core-site.xml is in the classpath, because none of the tutorials I have read mentioned it.. And you will run into the trouble, since without fs.defaultFS configuration present, Spark will consider that the destination directory is the same as the source (your local file system) rather then remote HDFS filesystem.

like image 167
Serhiy Avatar answered Oct 30 '22 07:10

Serhiy