The scenario
Imagine I am forced to work with some of my files always stored inside .zip
files. Some of the files inside the ZIP file are small text files and change often, while others are larger but luckily rather static (e.g. images).
If I want to place these ZIP files inside a Git repository, each ZIP is treated as a blob, so whenever I commit the repository grows by the size of the ZIP file... even if only one small text file inside changed!
Why this is realistic
Microsoft Word 2007/2010 .docx
and Excel .xlsx
files are ZIP files...
What I want
Is there, by any chance, a way to tell Git to not treat ZIP files as files, but rather as directories and treat their contents as files?
The advantages
But it couldn't work, you say?
I realize that without extra metadata this would lead to some amount of ambiguity: on a git checkout
Git would have to decide whether to create foo.zip/bar.txt
as a file in a regular directory or a ZIP file. However, this could be solved through configuration options, I would think.
Two ideas how it could be done (if it doesn't exist yet)
minizip
or IO::Compress::Zip
inside GitDuring git add / commit the ZIP file will be automatically expanded to this text format for normal text diffing, and during checkout, it is automatically zipped up again. The text file is composed of records, each representing a file in the ZIP file.
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common.
Press and hold (or right-click) the file or folder, select (or point to) Send to, and then select Compressed (zipped) folder. A new zipped folder with the same name is created in the same location.
This doesn't exist, but it could easily exist in the current framework. Just as Git acts differently with displaying binary or ASCII files when performing a diff, it could be told to offer special treatment to certain file types through the configuration interface.
If you don't want to change the code base (although this is kind of a cool idea you've got), you could also script it for yourself by using pre-commit and post-checkout hooks to unzip and store the files, then return them to their .zip state on checkout. You would have to restrict actions to only those files blobs / indexes that are specified by git add
.
Either way is a bit of work -- it's just a question of whether the other Git commands are aware of what's going on and play nicely.
Use bup (presented in details in GitMinutes #24)
It is the only git-like system designed to deal with large (even very very large) files, which means every version of a zip file will only increase the repo from its delta (instead of a full additional copy)
The result is an actual git repo, that a regular Git command can read.
I detail how bup
differs from Git in "git with large files".
Any other workaround (like git-annex
) isn't entirely satisfactory, as detailed in "git-annex
with large files".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With