Repository organization for Hadoop project

Tags:

I am starting on a new Hadoop project that will have multiple hadoop jobs(and hence multiple jar files). Using mercurial for source control, I was wondering what would be optimal way of organizing the repository structure? Should each job live in separate repo or would it be more efficient to keep them in the same, but break down into folders?

247

asked Jun 02 '10 00:06

Alex N.

1 Answers

If you're pipelining the Hadoop jobs (output of one is the input of another), I've found it's better to keep most of it in the same repository since I tend to generate a lot of common methods I can use in the various MR jobs.

Personally, I keep the streaming jobs in a separate repo from my more traditional jobs since there are generally no dependencies.

Are you planning on using the DistributedCache or streaming jobs? You might want a separate directory for files you distribute. Do you really need a JAR per Hadoop job? I've found I don't.

If you give more details about what you plan on doing with Hadoop, I can see what else I can suggest.

162

answered Oct 13 '22 07:10

Eric Wendelin

Related questions
                            
                                Mercurial repo inside a repo
                            
                                Please suggest a better workflow in Mercurial
                            
                                Git and Mercurial - can someone explain this test result
                            
                                How to use Mercurial for version control of text documents?
                            
                                How do you use Source Control without IDE integration? [closed]
                            
                                Using subrepositories with bitbucket
                            
                                How do I "reopen" a git commit?
                            
                                How to view complete text of Mercurial commit messages in TortoiseHg Workbench
                            
                                Does SVN have an equivalent for "hg clone" in Mercurial or "git clone" in Git?
                            
                                Which Distributed Source Control System has the best integration with Windows & Visual Studio?
                            
                                Is it possible to set up a private Mercurial repository on Google Code?
                            
                                Is there any harmful commands using GIT and HG
                            
                                Mercurial/TortoiseHG Merge Trunk Changes into Branch
                            
                                How many people were involved in a project? Based on Revision Control System

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Repository organization for Hadoop project

Tags:

repository

organization

mercurial

hadoop

Alex N.

People also ask

1 Answers

Eric Wendelin

Recent Activity

Donate For Us