In oozie, input-events
are pretty straightforward, if the specifies file/folder is not present, the coordinator job is kept in WAITING
state. But I could not understand what output-events
does.
As per my understanding, the files/folders specified in output-events
tag should be created by oozie in case all specified actions are successful. But that does not happen. I cannot find any relevant logs either. Nor are the documentations clear about this.
So, the question is, does Oozie really create files/folders specified in output-events
? Or does it just mention that these particular files/folders are created during the workflow and the responsibility of creation is on jobs, not on Oozie?
Relevant piece of code can be found at https://gist.github.com/venkateshshukla/de0dc395797a7ffba153
The official Oozie documentation for Oozie Coordinator is not very clear on the exact purpose of the output-events
element. However, the book "Apache Oozie: The Workflow Scheduler for Hadoop" mentions the following:
During reprocessing of a coordinator, Oozie tries to help the retry attempt by cleaning up the output directories by default. For this, it uses the
<output-events>
specification in the coordinator XML to remove the old output before running the new attempt. Users can override this default behavior using the–noCleanup
option.
So, in summary:
output-events
are not automatically created by Oozie, you need to create those files in your Oozie workflow actions.output-events
configuration is for giving Oozie information on what files will be created by your Oozie workflow actions, which Oozie would use to cleanup files when rerunning/reprocessing a coordinator.Always the actions generate the data, these settings are just for control. You'll find some examples here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With