Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use Oozie workflow configuration property in the workflow itself?

Tags:

hadoop

hive

oozie

I have an Oozie coordinator that watches for a file to show up in a certain directory. This coordinator runs daily. If the file being watched shows up, a workflow is launched.

The workflow takes the parameter of the file/directory being watched. Oozie passes this to it. It is a fully qualified path (i.e: hdfs://myhost/dir1/dir2/2015-02-17).

I need to grab the /dir1/dir2/2015-02-17 and pass it into a Hive script, which doesn't seem to take a fully qualified HDFS path. Which means I need to use Workflow EL function to strip out the hdfs://myhost part. I think replaceAll() will do this. The problem is passing the result of that into Hive.

Is there a way to use workflow configuration property in the workflow itself?

For example, I want to be able to use 'dateToProcess' which is part of a directory name that is an input to the workflow:

  <workflow-app name="mywf" xmlns="uri:oozie:workflow:0.4">
  <parameters>
    <property>
       <name>region</name>
    </property>
    <property>
       <name>hdfsDumpDir</name>
    </property>
    <property>
      <name>hdfsWatchDir</name>
      <value>${nameNode}${watchDir}</value>
    </property>
  </parameters>

  <start to="copy_to_entries"/>
  <action name="copy_to_entries">
    <hive xmlns="uri:oozie:hive-action:0.2">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <job-xml>hive-site.xml</job-xml>

      <configuration>
        <property>
          <name>mapred.job.queue.name</name>
          <value>${queueName}</value>
        </property>
        <property>
          <name>dateToProcess</name>
          <value>${replaceAll(hdfsDumpDir, hdfsWatchDir,"")}</value>
        </property>
      </configuration>

      <script>myhivescript.q</script>
      <!--
           Parameters referenced within Hive script.
      -->
      <param>INPUT_TABLE=dumptable</param>
      <param>INPUT_LOCATION=${watchDir}/${wf:conf('dateToProcess')}</param>
    </hive>
    <ok to="cleanup"/>
    <error to="sendEmailKill"/>
  </action>
  ...
  </workflow>

I get an empty string when I use $wf:conf('dateToProcess'). I get variable not found when I use ${dateToProcess}.

Any ideas?

like image 792
Shinta Smith Avatar asked Sep 29 '22 17:09

Shinta Smith


1 Answers

Remove

    <property>
      <name>dateToProcess</name>
      <value>${replaceAll(hdfsDumpDir, hdfsWatchDir,"")}</value>
    </property>

and instead place its value directly into the <param> i.e.

  <param>INPUT_LOCATION=${watchDir}/${replaceAll(hdfsDumpDir, hdfsWatchDir,"")}</param>

If you're going to be using this in more than one place, add the dateToProcess property to config-default.xml, and then it will be available as you intended.

like image 177
Ben Watson Avatar answered Nov 15 '22 09:11

Ben Watson