According to this: <blockquote> It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git. </blockquote> Yet when I run <code>git show $SHA1ofCommitObject</code>... <pre class="prettyprint"><code>commit 4405aa474fff8247607d0bf599e054173da84113 Author: Joe Smoe <joe.smoe@example.com> Date: Tue May 1 08:48:21 2012 -0500 First commit diff --git a/index.html b/index.html new file mode 100644 index 0000000..de8b69b --- /dev/null +++ b/index.html @@ -0,0 +1 @@ +<h1>Hello World!</h1> diff --git a/interests/chess.html b/interests/chess.html new file mode 100644 index 0000000..e5be7dd --- /dev/null +++ b/interests/chess.html @@ -0,0 +1 @@ +Did you see on Slashdot that King's Gambit accepted is solved! <a href="http://game </code></pre> ... it outputs the diff of the commit with the previous commits. I know that git doesn't store diffs in blob objects, but does it store diffs in commit objects? Or is <code>git show</code> dynamically calculating the diff?

What the statement means is that, most other version control systems need a point of reference in the past to be able to re-create the current commit. For example, at some point in the past, a diff-based VCS (version control system) would have stored a full snapshot: <pre class="prettyprint"><code>x = snapshot + = diff History: x-----+-----+-----+-----(+) Where we are now </code></pre> So, in such a scenario, to re-create the state at (now), it would have to checkout (x) and then apply diffs for each (+) until it gets to now. Note that it would extremely inefficient to store deltas forever, so every so often, delta based VCSes store a full snapshot. Here's how its done for subversion. Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot. So if that is the case, then where does the delta compression that git uses come in? Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. The thing is, though, that Git does not do this immediately after every commit, but rather on a garbage collection run. So, if git has not run its garbage collection, you can see objects in your index with very similar content. However, when Git runs its garbage collection (or when you call <code>git gc</code> manually), then the duplicates are cleaned up and a read only pack file is created. You don't have to worry about running garbage collection manually - git contains heuristics which tell it when to do so.

does git store diff information in commit objects?

Tags:

git

diff

According to this:

It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.

Yet when I run git show $SHA1ofCommitObject...

commit 4405aa474fff8247607d0bf599e054173da84113 Author: Joe Smoe <[email protected]> Date:   Tue May 1 08:48:21 2012 -0500      First commit  diff --git a/index.html b/index.html new file mode 100644 index 0000000..de8b69b --- /dev/null +++ b/index.html @@ -0,0 +1 @@ +<h1>Hello World!</h1> diff --git a/interests/chess.html b/interests/chess.html new file mode 100644 index 0000000..e5be7dd --- /dev/null +++ b/interests/chess.html @@ -0,0 +1 @@ +Did you see on Slashdot that King's Gambit accepted is solved! <a href="http://game

... it outputs the diff of the commit with the previous commits. I know that git doesn't store diffs in blob objects, but does it store diffs in commit objects? Or is git show dynamically calculating the diff?

381

asked May 01 '12 13:05

Alexander Bird

1 Answers

What the statement means is that, most other version control systems need a point of reference in the past to be able to re-create the current commit.

For example, at some point in the past, a diff-based VCS (version control system) would have stored a full snapshot:

x = snapshot + = diff History: x-----+-----+-----+-----(+) Where we are now

So, in such a scenario, to re-create the state at (now), it would have to checkout (x) and then apply diffs for each (+) until it gets to now. Note that it would extremely inefficient to store deltas forever, so every so often, delta based VCSes store a full snapshot. Here's how its done for subversion.

Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot.

So if that is the case, then where does the delta compression that git uses come in?

Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. The thing is, though, that Git does not do this immediately after every commit, but rather on a garbage collection run. So, if git has not run its garbage collection, you can see objects in your index with very similar content.

However, when Git runs its garbage collection (or when you call git gc manually), then the duplicates are cleaned up and a read only pack file is created. You don't have to worry about running garbage collection manually - git contains heuristics which tell it when to do so.

answered Sep 18 '22 23:09

Carl

Related questions
                            
                                git: Why doesn't git diff show any differences?
                            
                                Github API - create branch?
                            
                                BitBucket: Host key verification failed
                            
                                Best practices for cross platform git config?
                            
                                How to 'Watch' only a directory in a GitHub repository?
                            
                                git push heroku master permission denied
                            
                                How to install a bower package using a private git server (SSH)?
                            
                                How can I 'git clone' from another machine?
                            
                                Merge GIT branch without commit log
                            
                                How do I export my project as a .zip of git repository?
                            
                                Git: Who has modified this line?
                            
                                What is a Git commit ID?
                            
                                Is there a way to revert to a previous commit in VS code?
                            
                                How to resolve a conflict with git-svn?
                            
                                Piping output from Git Bash to clipboard
                            
                                No such keg: /usr/local/Cellar/git
                            
                                git commit frequency
                            
                                Git: Pushing to two repos in one command
                            
                                could not find compatible versions for pod
                            
                                How to reset Heroku app and re-commit everything?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With