I have written an application in Scala that uses Spark.
The application consists of two modules - the App
module which contains classes with different logic, and the Env
module which contains environment and system initialization code, as well as utility functions.
The entry point is located in Env
, and after initialization, it creates a class in App
(according to args
, using Class.forName
) and the logic is executed.
The modules are exported into 2 different JARs (namely, env.jar
and app.jar
).
When I run the application locally, it executes well. The next step is to deploy the application to my servers. I use Cloudera's CDH 5.4.
I used Hue to create a new Oozie workflow with a Spark task with the following parameters:
yarn
cluster
myApp
lib/env.jar,lib/app.jar
env.Main
(in Env
module)app.AggBlock1Task
I then placed the 2 JARs inside the lib
folder in the workflow's folder (/user/hue/oozie/workspaces/hue-oozie-1439807802.48
).
When I run the workflow, it throws a FileNotFoundException
and the application does not execute:
java.io.FileNotFoundException: File file:/cloudera/yarn/nm/usercache/danny/appcache/application_1439823995861_0029/container_1439823995861_0029_01_000001/lib/app.jar,lib/env.jar does not exist
However, when I leave the Spark master and mode parameters empty, it all works properly, but when I check spark.master
programmatically it is set to local[*]
and not yarn
. Also, when observing the logs, I encountered this under Oozie Spark action configuration:
--master
null
--name
myApp
--class
env.Main
--verbose
lib/env.jar,lib/app.jar
app.AggBlock1Task
I assume I'm not doing it right - not setting Spark master and mode parameters and running the application with spark.master
set to local[*]
. As far as I understand, creating a SparkConf
object within the application should set the spark.master
property to whatever I specify in Oozie (in this case yarn
) but it just doesn't work when I do that..
Is there something I'm doing wrong or missing?
Any help will be much appreciated!
I managed to solve the problem by putting the two JARs in the user directory /user/danny/app/
and specifying the Jar/py files
parameter as ${nameNode}/user/danny/app/env.jar
. Running it caused a ClassNotFoundException
to be thrown, even though the JAR was located at the same folder in HDFS. To work around that, I had to go to the settings and add the following to the options list: --jars ${nameNode}/user/danny/app/app.jar
. This way the App
module is referenced as well and the application runs successfully.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With