Does it make any sense to somehow store an "uncompressed" version of normally-compressed files in the repository?
If so, is there a standard way to implement this? (Perhaps a standard pre-commit hook that uncompresses each such file into a specially-named folder; and a post-checkout hook that compresses such specially-named folders into the compressed files that LibreOffice knows how to read and write? Something like the process described by "Should I decompress zips before I archive?" ?) (Perhaps hacking the code of the version control software to automagically decompress the old version and the new version and storing the diff between the decompressed files, and if that fails or doesn't offer a significant improvement, fall back on the original system of storing the direct diff between the original files, or simply storing the file directly?)
I have a collection of OpenOffice / LibreOffice files that are frequently edited. I am storing them in a version-control repository -- as recommended by "Should images be stored in a git repository?". Although I happen to be using TortoiseHg or SourceTree to access my repositories, rather than git.
I happen to know that Open Office files are actually zip-compressed container with a few XML files inside. (I hear that many other popular application "binary file formats" are also some form of zip-compressed file).
My understanding is that even the smallest change to such "binary" files leads to the entire new file stored in the repository. As opposed to small changes in "text" files, which leads to only the changes being stored and transmitted.
In theory, that would have the advantages of:
Git is not really supposed to handle compressed or binary files, and especially not larger files.
Unzipping is the act of extracting the files from a zipped single file or similar file archive. If the files in the package were also compressed -- as they usually are -- unzipping decompresses them.
On GitHub.com, navigate to the main page of the repository. Above the list of files, using the Add file drop-down, click Upload files. Drag and drop the file or folder you'd like to upload to your repository onto the file tree.
During git add / commit the ZIP file will be automatically expanded to this text format for normal text diffing, and during checkout, it is automatically zipped up again. The text file is composed of records, each representing a file in the ZIP file.
Does it make any sense to somehow store an "uncompressed" version of normally-compressed files in the repository?
It makes sense especially if you need branching and diff'ing.
This old thread summarizes the situation.
- For Openoffice documents whose size is dominated by embed images and other large objects, the git delta mechanism already performs reasonably well, since OO files are Zip archives where each file is compressed separately.
If you do not change an image, then that image remains stored in the same way and the delta can be done.- For OO documents whose size is dominated by plain content, the git delta mechanism cannot work, since the zip compression introduces "mixing" and a small change in the document is converted into a very large change in the zip file.
It could be possible to write a
clean
filter to uncompress before commit.
However there is a trick with the complementarysmudge
filter to be used at checkout. If you do not smudge properly, git always shows the file as changed wrt the index.
Smudging correctly would mean using the very same compression ratio and compress method that OO uses, which can be a little tricky. I have tried using the zip binary both in theclean
and thesmudge
phases and it does not work nicely. The smudged file is always different from the original one.
One should probably work at a lower level to have a finer control on what is happening (libzip) and prepend to the uncompressed file the compression parameters to be restored on smudging.The bigger issue is however that the clean/smudge thing can be really slow when dealing with large OO files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With