I have an Oozie coordinator that watches for a file to show up in a certain directory. This coordinator runs daily. If the file being watched shows up, a workflow is launched.
The workflow takes the parameter of the file/directory being watched. Oozie passes this to it. It is a fully qualified path (i.e: hdfs://myhost/dir1/dir2/2015-02-17).
I need to grab the /dir1/dir2/2015-02-17 and pass it into a Hive script, which doesn't seem to take a fully qualified HDFS path. Which means I need to use Workflow EL function to strip out the hdfs://myhost part. I think replaceAll() will do this. The problem is passing the result of that into Hive.
Is there a way to use workflow configuration property in the workflow itself?
For example, I want to be able to use 'dateToProcess' which is part of a directory name that is an input to the workflow:
<workflow-app name="mywf" xmlns="uri:oozie:workflow:0.4">
<parameters>
<property>
<name>region</name>
</property>
<property>
<name>hdfsDumpDir</name>
</property>
<property>
<name>hdfsWatchDir</name>
<value>${nameNode}${watchDir}</value>
</property>
</parameters>
<start to="copy_to_entries"/>
<action name="copy_to_entries">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>dateToProcess</name>
<value>${replaceAll(hdfsDumpDir, hdfsWatchDir,"")}</value>
</property>
</configuration>
<script>myhivescript.q</script>
<!--
Parameters referenced within Hive script.
-->
<param>INPUT_TABLE=dumptable</param>
<param>INPUT_LOCATION=${watchDir}/${wf:conf('dateToProcess')}</param>
</hive>
<ok to="cleanup"/>
<error to="sendEmailKill"/>
</action>
...
</workflow>
I get an empty string when I use $wf:conf('dateToProcess'). I get variable not found when I use ${dateToProcess}.
Any ideas?
Remove
<property>
<name>dateToProcess</name>
<value>${replaceAll(hdfsDumpDir, hdfsWatchDir,"")}</value>
</property>
and instead place its value directly into the <param>
i.e.
<param>INPUT_LOCATION=${watchDir}/${replaceAll(hdfsDumpDir, hdfsWatchDir,"")}</param>
If you're going to be using this in more than one place, add the dateToProcess
property to config-default.xml
, and then it will be available as you intended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With