Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uncompress OpenOffice files for better storage in version control

I've heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change the data, so delta compression doesn't work well in version control systems.

I've done basic testing on an OpenOffice file, unzipping it and then rezipping it with zero compression. I used the Linux zip utility for my testing. OpenOffice will still happily open it.

So I'm wondering if it's worth developing a small utility to run on ODF files each time just before I commit to version control. Any thoughts on this idea? Possible better alternatives?

Secondly, what would be a good and robust way to implement this little utility? Bash shell that calls zip (probably Linux only)? Python? Any gotchas you can think of? Obviously I don't want to accidentally mangle a file, and there are several ways that could happen.

Possible gotchas I can think of:

  • Insufficient disk space
  • Some other permissions issue that prevents writing the file or temporary files
  • ODF document is encrypted (probably should just leave these alone; the encryption probably also causes large file changes and thus prevents efficient delta compression)
like image 381
Craig McQueen Avatar asked Jun 10 '09 12:06

Craig McQueen


2 Answers

First, version control system you want to use should support hooks which are invoked to transform file from version in repository to the one in working area, like for example clean / smudge filters in Git from gitattributes.

Second, you can find such filter, instead of writing one yourself, for example rezip from "Management of opendocument (openoffice.org) files in git" thread on git mailing list (but see warning in "Followup: management of OO files - warning about "rezip" approach"),

You can also browse answers in "Tracking OpenOffice files/other compressed files with Git" thread, or try to find the answer inside "[PATCH 2/2] Add keyword unexpansion support to convert.c" thread.

Hope That Helps

like image 192
Jakub Narębski Avatar answered Oct 14 '22 02:10

Jakub Narębski


You may consider to store documents in FODT-format - flat XML format.
This is relatively new alternative solution available.

Document is just stored unzipped.

More info is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion.

like image 6
sergtk Avatar answered Oct 14 '22 02:10

sergtk