According to this:
It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.
Yet when I run git show $SHA1ofCommitObject
...
commit 4405aa474fff8247607d0bf599e054173da84113 Author: Joe Smoe <[email protected]> Date: Tue May 1 08:48:21 2012 -0500 First commit diff --git a/index.html b/index.html new file mode 100644 index 0000000..de8b69b --- /dev/null +++ b/index.html @@ -0,0 +1 @@ +<h1>Hello World!</h1> diff --git a/interests/chess.html b/interests/chess.html new file mode 100644 index 0000000..e5be7dd --- /dev/null +++ b/interests/chess.html @@ -0,0 +1 @@ +Did you see on Slashdot that King's Gambit accepted is solved! <a href="http://game
... it outputs the diff of the commit with the previous commits. I know that git doesn't store diffs in blob objects, but does it store diffs in commit objects? Or is git show
dynamically calculating the diff?
When you commit, git stores snapshots of the entire file, it does not store diffs from the previous commit. As a repository grows, the object count grows exponentially and clearly it becomes inefficient to store the data as loose object files. Hence, git packs them and stores them as a . pack file.
Git has a reputation for being confusing. Users stumble over terminology and phrasing that misguides their expectations. This is most apparent in commands that "rewrite history" such as git cherry-pick or git rebase.
To see the diff for a particular COMMIT hash, where COMMIT is the hash of the commit: git diff COMMIT~ COMMIT will show you the difference between that COMMIT 's ancestor and the COMMIT .
What the statement means is that, most other version control systems need a point of reference in the past to be able to re-create the current commit.
For example, at some point in the past, a diff-based VCS (version control system) would have stored a full snapshot:
x = snapshot + = diff History: x-----+-----+-----+-----(+) Where we are now
So, in such a scenario, to re-create the state at (now), it would have to checkout (x) and then apply diffs for each (+) until it gets to now. Note that it would extremely inefficient to store deltas forever, so every so often, delta based VCSes store a full snapshot. Here's how its done for subversion.
Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot.
So if that is the case, then where does the delta compression that git uses come in?
Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. The thing is, though, that Git does not do this immediately after every commit, but rather on a garbage collection run. So, if git has not run its garbage collection, you can see objects in your index with very similar content.
However, when Git runs its garbage collection (or when you call git gc
manually), then the duplicates are cleaned up and a read only pack file is created. You don't have to worry about running garbage collection manually - git contains heuristics which tell it when to do so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With