I got a oozie workflow, running on a CDH4 cluster of 4 machines (one master-for-everything, three "dumb" workers). The hive metastore runs on the master using mysql (driver is present), the oozie server also runs on the master using mysql, too. Using the web interface I can import and query hive as expected, but when I do the same queries within an oozie workflow it fails. Even the addition of the "IF EXISTS" leads to the error below. I tried to add the connection information as properties to the hive job without any success.
Can anybody give me a hint? Did I miss anything? Any further information needed?
This is the output of the job's log:
Script [drop.sql] content:
------------------------
DROP TABLE IF EXISTS performance_log;
------------------------
Hive command arguments :
-f
drop.sql
=================================================================
>>> Invoking Hive command line now >>>
Intercepting System.exit(10001)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
Oozie Launcher failed, finishing Hadoop job gracefully
And this is the error message:
FAILED: SemanticException [Error 10001]: Table not found performance_log
Intercepting System.exit(10001)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
Workflow in Oozie is a sequence of actions arranged in a control dependency DAG (Direct Acyclic Graph). The actions are in controlled dependency as the next action can only run as per the output of current action. Subsequent actions are dependent on its previous action.
The action needs to know the JobTracker and the NameNode of the underlying Hadoop cluster where Oozie has to run the hive action . Below are the elements supported in hive workflow action
The Hive query and the required configuration, libraries, and code for user-defined functions have to be packaged as part of the workflow bundle and deployed to HDFS. The action needs to know the JobTracker and the NameNode of the underlying Hadoop cluster where Oozie has to run the hive action .
These properties have to be passed in as configuration to Oozie’s Hive action. The script element points to the actual Hive script to be run with the <param> elements used to pass the parameters to the script. Hive supports variable substitution .
The problem is other nodes don't know where your MYSQL is , so you are getting error table not found.
You need to do 2 things
Something like below
action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
This should work.
Thanks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With