The scenario Imagine I am forced to work with some of my files always stored inside <code>.zip</code> files. Some of the files inside the ZIP file are small text files and change often, while others are larger but luckily rather static (e.g. images). If I want to place these ZIP files inside a Git repository, each ZIP is treated as a blob, so whenever I commit the repository grows by the size of the ZIP file... even if only one small text file inside changed! Why this is realistic Microsoft Word 2007/2010 <code>.docx</code> and Excel <code>.xlsx</code> files are ZIP files... What I want Is there, by any chance, a way to tell Git to not treat ZIP files as files, but rather as directories and treat their contents as files? The advantages <ul> <li>much smaller repository size, i.e. quicker transfer/backup</li> <li> Display changes with Git to ZIP files would automagically work</li> </ul> But it couldn't work, you say? I realize that without extra metadata this would lead to some amount of ambiguity: on a <code>git checkout</code> Git would have to decide whether to create <code>foo.zip/bar.txt</code> as a file in a regular directory or a ZIP file. However, this could be solved through configuration options, I would think. Two ideas how it could be done (if it doesn't exist yet) <ul> <li>using a library such as <code>minizip</code> or <code>IO::Compress::Zip</code> inside Git</li> <li>somehow adding a filesystem layer such that Git actually sees ZIP files as directories to start with</li> </ul>

Use bup (presented in details in GitMinutes #24) It is the only git-like system designed to deal with large (even very very large) files, which means every version of a zip file will only increase the repo from its delta (instead of a full additional copy) The result is an actual git repo, that a regular Git command can read. I detail how <code>bup</code> differs from Git in "git with large files". <hr> Any other workaround (like <code>git-annex</code>) isn't entirely satisfactory, as detailed in "<code>git-annex</code> with large files".

Can Git treat ZIP files as directories and files inside the ZIP as blobs?

Tags:

git

msysgit

zip

The scenario

Imagine I am forced to work with some of my files always stored inside .zip files. Some of the files inside the ZIP file are small text files and change often, while others are larger but luckily rather static (e.g. images).

If I want to place these ZIP files inside a Git repository, each ZIP is treated as a blob, so whenever I commit the repository grows by the size of the ZIP file... even if only one small text file inside changed!

Why this is realistic

Microsoft Word 2007/2010 .docx and Excel .xlsx files are ZIP files...

What I want

Is there, by any chance, a way to tell Git to not treat ZIP files as files, but rather as directories and treat their contents as files?

The advantages

much smaller repository size, i.e. quicker transfer/backup
Display changes with Git to ZIP files would automagically work

But it couldn't work, you say?

I realize that without extra metadata this would lead to some amount of ambiguity: on a git checkout Git would have to decide whether to create foo.zip/bar.txt as a file in a regular directory or a ZIP file. However, this could be solved through configuration options, I would think.

Two ideas how it could be done (if it doesn't exist yet)

using a library such as minizip or IO::Compress::Zip inside Git
somehow adding a filesystem layer such that Git actually sees ZIP files as directories to start with

263

asked Nov 03 '11 20:11

Jonas Heidelberg

2 Answers

This doesn't exist, but it could easily exist in the current framework. Just as Git acts differently with displaying binary or ASCII files when performing a diff, it could be told to offer special treatment to certain file types through the configuration interface.

If you don't want to change the code base (although this is kind of a cool idea you've got), you could also script it for yourself by using pre-commit and post-checkout hooks to unzip and store the files, then return them to their .zip state on checkout. You would have to restrict actions to only those files blobs / indexes that are specified by git add.

Either way is a bit of work -- it's just a question of whether the other Git commands are aware of what's going on and play nicely.

answered Oct 04 '22 13:10

Jeff Ferland

Use bup (presented in details in GitMinutes #24)

It is the only git-like system designed to deal with large (even very very large) files, which means every version of a zip file will only increase the repo from its delta (instead of a full additional copy)

The result is an actual git repo, that a regular Git command can read.

I detail how bup differs from Git in "git with large files".

Any other workaround (like git-annex) isn't entirely satisfactory, as detailed in "git-annex with large files".

answered Oct 04 '22 14:10

VonC

Related questions
                            
                                Undo a fast-forward merge
                            
                                Standard to follow when writing git commit messages [duplicate]
                            
                                Why is "MINGW64" appearing on my Git bash?
                            
                                Can a Git hook automatically add files to the commit?
                            
                                Git refuses to reset/discard files
                            
                                Ignore whitespace in Visual Studio Code git diff view
                            
                                Git refusing to merge unrelated histories. What is 'unrelated histories'?
                            
                                Collapsing a Group of Commits into One on Git
                            
                                How to delete a branch in the remote repository using EGIT?
                            
                                Out of a git console: how do I execute a batch file and then return to git console?
                            
                                Git: How to move back and forth between commits
                            
                                Automatically wrap long Git commit messages in Vim
                            
                                In Jenkins, how to checkout a project into a specific directory (using GIT)
                            
                                Git merge develop into feature branch outputs "Already up-to-date" while it's not
                            
                                git doesn't ignore 2 specifically named files
                            
                                How to reset a branch to another branch with git?
                            
                                I need a workaround for Resharper when it says 'Failed to modify Documents'. Does anybody know why it does this and how to get around it?
                            
                                How to rebase all the commits from the beginning
                            
                                Compiler error - msgfmt command not found
                            
                                .gitignore is not ignoring directories

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With