Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pros and cons for keeping code and data in separate repositories

We have a project which has data and code, bundled into a single Mercurial repository. The data is just as important the code (it contains parameters for business logic, some inputs, etc.) However, the format of the data files changes rarely, and it's quite natural to change the data files independently from the code.

One advantage of the unified repository is that we don't have to keep track of multiple revisions: if we ever need to recreate output from a previous run, we only need to update the system to the single revision number stored in the output log.

One disadvantage is that if we modify the data while multiple heads are active, we may lose the data changes unless we manually copy those changes to each head.

Are there any other pros/cons to splitting the code and the data into separate repositories?

like image 346
max Avatar asked Nov 30 '12 06:11

max


People also ask

What are the pros to using code repositories?

You can archive all your files in a repository, keeping any other versions or files, even if you aren't using them at the moment. Code repositories also give you a way to name or tag the different versions, keeping records of changes within the same project.

Should app code & IAC be in separate repositories?

As a general rule you should keep your infrastructure next to the application code. So, yes you should have it in the same repository. Of course you might be in the case in which some resources are shared between different projects.

Should each project have its own repository?

If your projects are independent, it's fine to keep them in separate repositories. If they share components, then put them together.


1 Answers

Multiple repos:

  • pros:

    • component-based approach (you identify groups of files that can evolve independently one from another)
    • configuration specification: you list the references (here "revisions") you need for your system to work. If you want to modify one part without changing the other, you update that list.
    • partial clones: if you don't need all components, you can only clone the ones you want (doesn't apply in your case)
  • cons

    • configuration management: you need to track that configuration (usually through a parent repo, registering subrepos)
    • in your case, data is quite dependent on certain versions of the projects (you can have new data which doesn't make sense for old versions of the project)

One repo

  • pros
    • system-based approach: you see your modules as one system (project and data).
    • repo management: all in one
    • tight link between modules (which can makes sense for data)
  • cons
    • data propagation (when, as you mention, several HEAD are active)
    • intermediate revisions (not to reflect a new feature, but just because some data changes)
    • larger clone (not relevant here, unless your data include large binaries)

For non-binary data, with infrequent changes, I would still keep them in the same repo.

like image 159
VonC Avatar answered Oct 22 '22 16:10

VonC