I am a graduate student and my main venue for research is software simulation. I have some C++ code that I use to generate results, but my big problem is that in order to strive for reproducibility, I want to save enough metadata in my binary so that I can go back to the exact source code that generated that binary (mostly to see if some bug I found invalidated some result I generated earlier).
In other words, when I produce a set of output files, I want the binary to dump both the git commit of the current revision as well as any outstanding changes. This would allow me (in theory) to check out that commit, apply the saved patch, and get back to the exact source code that created the binary.
I know I can do this out of band by manually saving the info or something, but in order to ensure full consistency, I'd like to just bake the information straight into the binary so that every binary will be able to be traced back to its exact source.
I'm familiar with doing things like setting a #define flag in the makefile to store something like the git commit SHA1, but I presume I need some kind of more clever way of storing the entire git diff as a string in the binary.
So I have several questions:
Thanks.
Edit: I guess I didn't make clear that the reason I want to save the diff is to capture any uncommitted changes on top of the current HEAD. I can store the hash, but if I make the mistake of using a binary with some uncommitted stuff wrapped into it, then I can't get back the correct source.
Saving the git "id" number (hash) in your code is not a bad idea. Saving the diff is pretty pointless, as the hash (along with which branch it came from) should allow you to get back to the original code.
Just make sure your build-and-test-system is setup such that you can't use something that hasn't been committed, that way you can't have some random changes that aren't committed in the build.
Edit: There's a difference between testing on your machine, in a local copy of the project, and testing using the test-suite that checks EVERYTHING - this is what you use to confirm that everything works, right? Note that it doesn't REALLY matter what you test until someone else gets a copy of that code - don't let other people ever see your code until it's been committed, and don't allow the complete test-suite that saves the test results for the release notes, etc, to run if you haven't committed everything [or better yet, have a separate directory/machine, which ONLY gets fresh code from the central repo - if you do that, then you can't possibly use uncommitted code.
I have worked on several projects that work just that way - you can build in your local directory with uncommitted code, but all "official builds" are done on a different machine, code always straight from the repo, no local changes.
If you don't have two machines, perhaps having a virtual machine that "acts like a separate machine", or using simply a second directory [or different user?] that you use for the "official tests".
Actually, you could simply CHECK if there are some diffs, and then in along with your "this is the hash", if there is any differences, add an extra "-with-uncommitted-changes" or something like that. You can use git diff --exit-code to give you a 0 or 1 exit-code for "no changes" or "changes".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With