"Logbook" for scientific simulations

Question

I'm using C++ to perform scientific simulation on some things. At this moment, due to the increasing number of parameters, I found necessary to have a "logbook": a file where all the information about a given simulation is stored (not the output; the parameters that led to that output and the respective git commit).

I've searched and it seems to me that the use of XML should be a good option, since it can easily be parsed using python, mathematica or other analysis software.

I wonder if anyone agrees with this, or has a better option.

Besides, I wonder how can I pick the current commit of git to save it on the logbook.

hroptatyr · Accepted Answer

In general I agree with you:

XML is widely deployed, there's tonnes of tools to bring the logbook into shape.
It's flexible, you can add additional attributes later without breaking old ``scripts''
It's file based, one document, one file, use the filesystem to organise logbook ``pages''
It's file based and plain text, tools like find, grep, diff (at a push) can help you in urgent cases
It's your own solution, you're free to track any information you need, and if you deem it essential to associate sunlight hours with the parameters, do it.

That being said, I should add the storage format depends on the typical use case, if you need to find out why every monday after a full moon the optimiser cannot find any solutions, it will be hard (well, harder) to come up with the necessary XPath/XQuery hackery to do that because of the non-normativity of your structure.

Well all the downsides I can think of:

It's verbose, XML documents in my area tend to be more like 20 to 40 GBs whereas the info probably could be represented in more like 500 MB.
It's slow (depends on how you use it), RDBMs or even nosql solutions employ techniques like indexing to make reading faster.
It's flexible, that's also a downside: If you happen to add two new attributes per day you will end up with nothing but a marked up free text, it will need thorough polishing if you want to import it into structure-focussed systems (SQL, csv, json, ...)
It's your own solution, you have to write it and maintain it

As for the second bit: git describe --always HEAD

thiton · Answer

The easiest option is to make your program a pure function, i.e. externalize all changing and possibly changing parameters into program options so that a simulation is completely specified by the options and a git commit identifier.

Boost.Program_options aids greatly in implementing such a scheme.

Fomite · Answer

This may sound odd on a programming site, but I found doing several bits of simulation work that the best log book was...well...a log book.

Specifically, I've used this one extensively (link to Amazon). It may because I came from a wet lab/biology background, but I found something appealing about an old dead tree notebook. It's admittedly not automated, and won't do well if you're running a huge number of different parameter combinations or if your simulation has a large number of parameters to begin with.

But for the project I was working on, which has ~ 20 or so parameters that might vary, I liked being able to record freeform notes about my thoughts, have them in an easily portable, easy to recall and fairly durable form, and for many fellow lab mates, "Keep a lab notebook" seemed to work better with a physical thing.

Your milage may, of course, vary.

"Logbook" for scientific simulations

Tags:

c++

logging

scientific-computing

Jorge Leitao

3 Answers

hroptatyr

thiton

Fomite

Recent Activity

Donate For Us

"Logbook" for scientific simulations

Tags:

c++

logging

scientific-computing

Jorge Leitao

3 Answers

hroptatyr

thiton

Fomite

Related questions

Recent Activity

Donate For Us