I'm running experiments on a model, with a workflow like this:
I'm using Git and Scientific Reproducibility as a guide , where the results of an experiment are stored in a table along the hash of the commit. I would like to store the results in a directory instead, naming the directories as hashes.
Thinking about version control, I would like to isolate the code
and analysis
. For example, a change of the color in a plot in a IPython notebook in analysis
shouldn't change anything in code
The approach I'm thinking:
A directory structure like this:
model
- code
- simulation_results
- a83bc4
- 23e900
- etc
- analysis
and different Git repositories for code
and analysis
, leaving simulation_results
out of Git.
Any comments? A better solution? Thanks.
That seems sound, and your structure would be a good fit for using git submodules
, model
becoming a parent git repo.
That way, you will link together code
, and analysis
SHA1 within the model
repo.
That means you can create your directory within the private (ie not versioned) directory model/simulation_results
based on the SHA1 of model
repo (the "parent" repo): that SHA1 links the SHA1 of both project
and analysis
submodules, which means you can reproduce the experiment exactly (based on the exact content of both project
and analysis
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With