I am quite new to Hadoop and I have currently been allocated a project on
"Implement a advanced job control framework to help chain multiple Map-Reduce jobs i.e. investigate/improve upon existing org.apache.hadoop.mapred.jobcontrol package."
This project is listed on Project Suggestion page under Random Ideas on http://wiki.apache.org/hadoop/ProjectSuggestions#research_projects
My confusion is, do I have to build an advance version of Oozie (which I think is a job control framework to chain multiple jobs) or something similar to that or does this means something completely different else.
What am I missing?
It looks like the project you are referring to might be related to this Jira ticket.
Right now the JobControl class is pretty bare, and it's missing a number of functionalities which could make a user's life easier. For example:
JobControl.run
and that's it, but in practice it could be interesting if I could get notified when something changes in my job.ControlledJob
class and retry up to that point before sending a notification that it failed.In the end I don't think you need to reinvent a completely new framework, the JobControl
class already provides a good starting point. Try to think with the point of view of the user, what can you do to make it easier and shorter to submit and manage jobs. The ideas here and in the ticket are only example, you are free to come with your own ideas.
As far as Oozie is concerned, it gives you a higher abstraction for controlling a jobflow, but it's also more complex to setup and should be reserved for more complex jobs. I know for a fact that some people are hesitant to use Oozie because it adds overhead to your applications. The big difference also is that Oozie is a server while JobControl
just runs on the client machine, which is additional overhead. While some of the features mentionned above are present in Oozie in 1 way or the other, the ability to keep it simple and running on the client machine without needing extra work like Oozie is in my opinion the key to your project.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With