Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Logbook" for scientific simulations

I'm using C++ to perform scientific simulation on some things. At this moment, due to the increasing number of parameters, I found necessary to have a "logbook": a file where all the information about a given simulation is stored (not the output; the parameters that led to that output and the respective git commit).

I've searched and it seems to me that the use of XML should be a good option, since it can easily be parsed using python, mathematica or other analysis software.

I wonder if anyone agrees with this, or has a better option.

Besides, I wonder how can I pick the current commit of git to save it on the logbook.

like image 299
Jorge Leitao Avatar asked Nov 03 '11 10:11

Jorge Leitao


3 Answers

In general I agree with you:

  • XML is widely deployed, there's tonnes of tools to bring the logbook into shape.
  • It's flexible, you can add additional attributes later without breaking old ``scripts''
  • It's file based, one document, one file, use the filesystem to organise logbook ``pages''
  • It's file based and plain text, tools like find, grep, diff (at a push) can help you in urgent cases
  • It's your own solution, you're free to track any information you need, and if you deem it essential to associate sunlight hours with the parameters, do it.

That being said, I should add the storage format depends on the typical use case, if you need to find out why every monday after a full moon the optimiser cannot find any solutions, it will be hard (well, harder) to come up with the necessary XPath/XQuery hackery to do that because of the non-normativity of your structure.

Well all the downsides I can think of:

  • It's verbose, XML documents in my area tend to be more like 20 to 40 GBs whereas the info probably could be represented in more like 500 MB.
  • It's slow (depends on how you use it), RDBMs or even nosql solutions employ techniques like indexing to make reading faster.
  • It's flexible, that's also a downside: If you happen to add two new attributes per day you will end up with nothing but a marked up free text, it will need thorough polishing if you want to import it into structure-focussed systems (SQL, csv, json, ...)
  • It's your own solution, you have to write it and maintain it

As for the second bit: git describe --always HEAD

like image 176
hroptatyr Avatar answered Nov 04 '22 10:11

hroptatyr


The easiest option is to make your program a pure function, i.e. externalize all changing and possibly changing parameters into program options so that a simulation is completely specified by the options and a git commit identifier.

Boost.Program_options aids greatly in implementing such a scheme.

like image 41
thiton Avatar answered Nov 04 '22 08:11

thiton


This may sound odd on a programming site, but I found doing several bits of simulation work that the best log book was...well...a log book.

Specifically, I've used this one extensively (link to Amazon). It may because I came from a wet lab/biology background, but I found something appealing about an old dead tree notebook. It's admittedly not automated, and won't do well if you're running a huge number of different parameter combinations or if your simulation has a large number of parameters to begin with.

But for the project I was working on, which has ~ 20 or so parameters that might vary, I liked being able to record freeform notes about my thoughts, have them in an easily portable, easy to recall and fairly durable form, and for many fellow lab mates, "Keep a lab notebook" seemed to work better with a physical thing.

Your milage may, of course, vary.

like image 28
Fomite Avatar answered Nov 04 '22 09:11

Fomite