What types of binary files does Git keep deltas for?

Tags:

We're dealing with a very large project that needs to be migrated to Git. Unfortunately, it contains a large number of binaries as well, some of which are zip-s, dll-s and so on. At the moment, it's not possible to remove these binaries from the version control system.

I would like to find out more about how Git keeps deltas for binary files and if, and for which ones it doesn't. I know this is configurable via the .gitattributes file, but do the file types need to be listed explicitly, or is there a pre-defined default set that it recognizes and handles automatically...?

975

asked Jan 17 '18 13:01

carlspring

1 Answers

First, let's get a bit of terminology out of the way. Files are stored as blob objects. These are one of four object types, the other three being commit, tree, and annotated tag.

Git's model is that all objects are logically independent. Everything is stored by its hash ID key, in a database. To retrieve any object, you start by knowing its hash ID, which you get from something or someone else.¹ You feed that hash ID to an object-getter, and it either looks up the object where it is stored directly, with no chance at delta compression at all—this is what Git calls a loose object—or, failing that, Git looks inside pack files, which pack multiple separate objects together and provide the opportunity for delta compression.²

What you're looking for, then, is information about which blob objects Git chooses to delta-compress against which other blob objects inside these pack files. The answer has evolved somewhat over time, so there is no single correct answer—but there are certain control knobs, including the .gitattributes one you mentioned.

The actual delta format is a modification of xdelta. It can, literally, compress (or "deltify") any binary data against any other binary data—but the results will be poor unless the inputs are well-chosen. It's the input choices that are the real key here. Git also has a technical documentation file describing how objects are chosen for deltification. This takes file path names, and especially final path component names, into account.

Note that if deltification fails to make the object smaller, the object is simply not delta-compressed. The object's original file size is also an input here, and core.bigFileThreshold (introduced in Git 1.7.6) sets a size value: files above this level are never deltified at all.

Hence, you can prevent Git from considering a file (object, really) for deltification by either of two ways:

set core.bigFileThreshold so that the object is too big, or
make the object's path name match a .gitattributes line that has -delta specified.

Note that when using Git-LFS, large files are not stored in Git at all. Instead, a large file (as defined by the Git-LFS settings) is replaced (at git add time) by an indirect name. Git then stores this indirect name as the blob object (using the original file's path). When Git extracts the object, Git-LFS inspects it before allowing it to go into your work-tree. Git-LFS detects that the object's data were replaced with an indirect name, and retrieves the "real" data from another (separate, not-Git-at-all) server using the indirect name. So Git never sees the large file's data at all: instead, it sees only these indirect names.

¹For instance, we might start with a branch name like master, which gets us the latest (or tip) commit hash ID. That hash ID gives us access to the commit object. The commit lists the hash ID of a tree. The tree, once we obtain it, lists the hash ID of some blob, along with the file's name. So, now we know that the hash ID for the version of README in the tip commit of master, if that's what we're looking for. Or, we use the commit data to find an older commit, which we use to find another even-older commit, and so on, until we arrive at the commit we want; and then we use the tree to find the blob IDs (and names) of files.

²Normally, an object can only be "deltified" against other objects in the same pack. For transport purposes, Git provides what it calls a thin pack in which objects can be delta-compressed against other objects that are omitted, but are assumed to be available on the other side of the transport mechanism. The other Git must "fatten up" the thin pack.

159

answered Nov 07 '22 10:11

torek

Related questions
                            
                                git am should ignore something in commit message startswith "[]"?
                            
                                How to update topic branch with upstream changes on master?
                            
                                Use gruntjs as precommit hook
                            
                                How to set up a repository using git-ftp?
                            
                                VS 2013 + Git + Visual Studio Online + multiple Repos in one project
                            
                                Git workflow - Reverting a feature branch from release branch
                            
                                How do I undo a heroku create that was run over existing app?
                            
                                Merge two branches not master
                            
                                TeamCity triggers too many builds for a new branch
                            
                                Git. Rebase local branch atop local master. How do I ignore a single files changes?
                            
                                Cannot commit changes with gitextensions
                            
                                Making a private github's wiki public
                            
                                Git remote command returns fatal: Invalid refspec +refs/heads/*:refs/remotes/:origin/*
                            
                                GitLab runner unable to clone repository via http
                            
                                How to use git show with pretty or format that come up with just commit message brief?
                            
                                VS2015 Update 2 Git undo dont work
                            
                                Replace GitHub repository with a new Android Studio project while preserving old commits
                            
                                Move tracked files to untracked with git
                            
                                How to git tag all submodules?
                            
                                Git - Cherry picking with ours/theirs strategy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What types of binary files does Git keep deltas for?

Tags:

git

git-lfs

carlspring

People also ask

1 Answers

torek

Recent Activity

Donate For Us