Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oozie workflow: Hive table not found but it does exist

I got a oozie workflow, running on a CDH4 cluster of 4 machines (one master-for-everything, three "dumb" workers). The hive metastore runs on the master using mysql (driver is present), the oozie server also runs on the master using mysql, too. Using the web interface I can import and query hive as expected, but when I do the same queries within an oozie workflow it fails. Even the addition of the "IF EXISTS" leads to the error below. I tried to add the connection information as properties to the hive job without any success.

Can anybody give me a hint? Did I miss anything? Any further information needed?

This is the output of the job's log:

  Script [drop.sql] content:
  ------------------------
  DROP TABLE IF EXISTS performance_log;

  ------------------------

  Hive command arguments :
  -f
  drop.sql

  =================================================================

  >>> Invoking Hive command line now >>>

  Intercepting System.exit(10001)

  <<< Invocation of Main class completed <<<

  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]

  Oozie Launcher failed, finishing Hadoop job gracefully

And this is the error message:

  FAILED: SemanticException [Error 10001]: Table not found performance_log
  Intercepting System.exit(10001)
  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
like image 923
Mario Mueller Avatar asked Apr 01 '13 19:04

Mario Mueller


People also ask

What is workflow in Oozie?

Workflow in Oozie is a sequence of actions arranged in a control dependency DAG (Direct Acyclic Graph). The actions are in controlled dependency as the next action can only run as per the output of current action. Subsequent actions are dependent on its previous action.

What does a hive workflow action need to know?

The action needs to know the JobTracker and the NameNode of the underlying Hadoop cluster where Oozie has to run the hive action . Below are the elements supported in hive workflow action

How do I deploy a hive query to HDFS?

The Hive query and the required configuration, libraries, and code for user-defined functions have to be packaged as part of the workflow bundle and deployed to HDFS. The action needs to know the JobTracker and the NameNode of the underlying Hadoop cluster where Oozie has to run the hive action .

How do I pass parameters to a hive script in Oozie?

These properties have to be passed in as configuration to Oozie’s Hive action. The script element points to the actual Hive script to be run with the <param> elements used to pass the parameters to the script. Hive supports variable substitution .


1 Answers

The problem is other nodes don't know where your MYSQL is , so you are getting error table not found.

You need to do 2 things

  1. Copy hive-site.xml in the oozie workflow directory
  2. In your Hive action tell oozie that use my hive-site.xml

Something like below

action name="hive-node"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>hive-site.xml</job-xml>

This should work.

Thanks

like image 150
user2230605 Avatar answered Jan 04 '23 06:01

user2230605