Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is git actually doing when it says it is "resolving deltas"?

Tags:

git

People also ask

What does delta mean in git?

Git uses Delta encoding which indicates a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files. An object in a pack is stored as a delta i.e. a sequence of changes to make to some other object.

What is git delta compression?

Delta compression (also called delta encoding, or just delta coding), is where only the differences to a known base file is stored, discarding any similarities. To decompress this, you apply the stored changes (also called “diffs”) to the base file, leaving you with the new file.


The stages of git clone are:

  1. Receive a "pack" file of all the objects in the repo database
  2. Create an index file for the received pack
  3. Check out the head revision (for a non-bare repo, obviously)

"Resolving deltas" is the message shown for the second stage, indexing the pack file ("git index-pack").

Pack files do not have the actual object IDs in them, only the object content. So to determine what the object IDs are, git has to do a decompress+SHA1 of each object in the pack to produce the object ID, which is then written into the index file.

An object in a pack file may be stored as a delta i.e. a sequence of changes to make to some other object. In this case, git needs to retrieve the base object, apply the commands and SHA1 the result. The base object itself might have to be derived by applying a sequence of delta commands. (Even though in the case of a clone, the base object will have been encountered already, there is a limit to how many manufactured objects are cached in memory).

In summary, the "resolving deltas" stage involves decompressing and checksumming the entire repo database, which not surprisingly takes quite a long time. Presumably decompressing and calculating SHA1s actually takes more time than applying the delta commands.

In the case of a subsequent fetch, the received pack file may contain references (as delta object bases) to other objects that the receiving git is expected to already have. In this case, the receiving git actually rewrites the received pack file to include any such referenced objects, so that any stored pack file is self-sufficient. This might be where the message "resolving deltas" originated.


Git uses delta encoding to store some of the objects in packfiles. However, you don't want to have to play back every single change ever on a given file in order to get the current version, so Git also has occasional snapshots of the file contents stored as well. "Resolving deltas" is the step that deals with making sure all of that stays consistent.

Here's a chapter from the "Git Internals" section of the Pro Git book, which is available online, that talks about this.


Amber seems to be describing the object model that Mercurial or similar uses. Git does not store the deltas between subsequent versions of an object, but rather full snapshots of the object, every time. It then compresses these snapshots using delta compression, trying to find good deltas to use, regardless of where in the history these exist.