Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to organize a set of scientific experiments using Git

I'm running experiments on a model, with a workflow like this:

  • I work in a model (a software in Python)
  • I change some parameters and run an experiment
  • Then, I will store the results of the experiment (as a pickle).
  • Then, I will analyze the (pickled) results using another software (IPython Notebooks).

I'm using Git and Scientific Reproducibility as a guide , where the results of an experiment are stored in a table along the hash of the commit. I would like to store the results in a directory instead, naming the directories as hashes.

Thinking about version control, I would like to isolate the code and analysis. For example, a change of the color in a plot in a IPython notebook in analysis shouldn't change anything in code

The approach I'm thinking:

A directory structure like this:

model
- code
- simulation_results
   - a83bc4
   - 23e900
   - etc 
- analysis

and different Git repositories for code and analysis, leaving simulation_results out of Git.

Any comments? A better solution? Thanks.

like image 709
Victor Avatar asked Jan 24 '13 13:01

Victor


1 Answers

That seems sound, and your structure would be a good fit for using git submodules, model becoming a parent git repo.

That way, you will link together code, and analysis SHA1 within the model repo.

That means you can create your directory within the private (ie not versioned) directory model/simulation_results based on the SHA1 of model repo (the "parent" repo): that SHA1 links the SHA1 of both project and analysis submodules, which means you can reproduce the experiment exactly (based on the exact content of both project and analysis).

like image 164
VonC Avatar answered Oct 11 '22 12:10

VonC