How to organize a set of scientific experiments using Git

Question

I'm running experiments on a model, with a workflow like this:

I work in a model (a software in Python)
I change some parameters and run an experiment
Then, I will store the results of the experiment (as a pickle).
Then, I will analyze the (pickled) results using another software (IPython Notebooks).

I'm using Git and Scientific Reproducibility as a guide , where the results of an experiment are stored in a table along the hash of the commit. I would like to store the results in a directory instead, naming the directories as hashes.

Thinking about version control, I would like to isolate the code and analysis. For example, a change of the color in a plot in a IPython notebook in analysis shouldn't change anything in code

The approach I'm thinking:

A directory structure like this:

model
- code
- simulation_results
   - a83bc4
   - 23e900
   - etc 
- analysis

and different Git repositories for code and analysis, leaving simulation_results out of Git.

Any comments? A better solution? Thanks.

VonC · Accepted Answer

That seems sound, and your structure would be a good fit for using git submodules, model becoming a parent git repo.

That way, you will link together code, and analysis SHA1 within the model repo.

That means you can create your directory within the private (ie not versioned) directory model/simulation_results based on the SHA1 of model repo (the "parent" repo): that SHA1 links the SHA1 of both project and analysis submodules, which means you can reproduce the experiment exactly (based on the exact content of both project and analysis).

How to organize a set of scientific experiments using Git

Tags:

git

scientific-computing

Victor

1 Answers

VonC

Recent Activity

Donate For Us

How to organize a set of scientific experiments using Git

Tags:

git

scientific-computing

Victor

1 Answers

VonC

Related questions

Recent Activity

Donate For Us