Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling loops in oozie workflow

Tags:

hadoop

oozie

I have an oozie use case for checking input data availability and trigger mapreduce job based on availability of data. So I wrote a shell script for checking input data and created an ssh action for it in oozie,

The number of retries and and retry intervals of Input data checking should be configurable and after each retry if the data is still missing I got to send an alert, after specified number of retries mapreduce job can start with the available data

I wrote a workflow as follows :

<start to="datacheck" />

<action name="datacheck">
    <ssh xmlns="uri:oozie:ssh-action:0.1">
        <host>${sshUserHost}</host>
        <command>${Oozieutilsscript}</command>
    </ssh>
    <ok to="datacheckswitch" />
    <error to="fail" />
</action>

<decision name="datacheckswitch">
    <switch>
        <case to="mapreduce">${(wf:actionData('datacheck')['datatransfer'] == "complete" )}</case>
        <case to="retry">${(wf:actionData('datacheck')['datatransfer'] == "incomplete" )}</case>        
        <default to="fail" />    
    </switch>
</decision>

<action name="retry">
    <ssh xmlns="uri:oozie:ssh-action:0.1">
        <host>${sshUserHost}</host>
        <command>${Oozieutilsscript1}</command>
    </ssh>
    <ok to="retryswitch" />
    <error to="fail" />
</action>

<decision name="retryswitch">
    <switch>
        <case to="datacheck">${(wf:actionData('datacheck')['retry'] == "notfinished" )}</case>
        <case to="datacheck">${(wf:actionData('datacheck')['retry'] == "finished" )}</case>     
        <default to="fail" />    
    </switch>
</decision>

<action name="mapreduce">
...............
</action>


<!--Kill and End portion-->
<kill name="fail">
    <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name="end" />

Only when I executed the workflow I came to know that oozie doesn't support cycles since its workflow is DAG. got the error Error: E0707 : E0707: Loop detected at parsing, node [datacheck] while parsing workflow.xml

Is there any different approach for handling this usecase ?

Any help is appreciated .

like image 879
SachinJ Avatar asked Dec 21 '22 02:12

SachinJ


1 Answers

You can simulate loops using recursion. The key idea is that a workflow calls itself using a sub-workflow action that points to the workflow file that contains the action node.

The recursion must be stopped using a decision node.

On my blog you can find a complete example for this.

like image 142
Helmut Zechmann Avatar answered Jan 12 '23 16:01

Helmut Zechmann