What is the best format to persist simple data frames to disc in R for storage while limiting semantic loss?
I ask because I'm archiving a data set. In an ideal world, my data format would have the follow characteristics:
My first thought was to use CSV which is very stable, but lacks the semantic richness required. On the other hand, R's builtin RData format completely captures R's semantics, but seems likely to change between releases (correct me if I'm wrong).
Is there another format that finds a balance between these three imperatives?
Dump it to a text file with dput
. That way you get all the structure of R's objects, and its in a text-based form that, should R stop existing, can be parsed fairly easily.
It probably doesn't pass (3), your 'open standard' test.
R is pretty good for backward compatibility with its .RData format, so even if the files written by the latest R aren't the same as older ones, the latest R will still read old files. However, if R should stop existing, reverse-engineering of the binary format is orders of magnitude harder than grokking the output from dput
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With