How are the Git commit IDs generated to uniquely identify the commits? Example: <code>521747298a3790fde1710f3aa2d03b55020575aa</code> How does it work? Are they only unique for each project? Or for the Git repositories globally?

Here's an example of a commit object file, decompressed. <pre class="prettyprint"><code>commit 238tree 0de83a78334c64250b18b5191f6cbd6b97e77f84 parent 6270c56bec8b3cf7468b5dd94168ac410eca1e98 author Michael G. Schwern <schwern@pobox.com> 1659644787 -0700 committer Michael G. Schwern <schwern@pobox.com> 1659644787 -0700 feature: I did something cool </code></pre> The commit ID is a SHA-1 hash of that. <pre class="prettyprint"><code>$ openssl zlib -d < .git/objects/81/2e8c33de3f934cb70dfe711a5354edfd4e8172 | sha1sum 812e8c33de3f934cb70dfe711a5354edfd4e8172 - </code></pre> This includes... <ul> <li>Full content of the commit, not just the diff, represented as a tree object ID.</li> <li>The ID of the previous commit (or commits if it's a merge).</li> <li>Commit and author date.</li> <li>Committer and author's name and email address.</li> <li>Log message.</li> </ul> (The author is who originally wrote the commit, the committer is who made the commit. This is usually the same, but it can be different. For example, when you rebase or amend a commit. Or if you're committing someone else's patch they emailed to you and want to attribute the author.) Change any of that and the commit ID changes. And yes, the same commit with the same properties will have the same ID on a different machine. This serves three purposes. First, it means the system can tell if a commit has been tampered with. It's baked right into the architecture. Second, one can rapidly compare commits just by looking at their IDs. This makes Git's network protocols very efficient. Want to compare two commits to see if they're the same? Don't have to send the whole diff, just send the IDs. Third, and this is the genius, two commits with the same IDs have the same history. That's why the ID of the previous commits are part of the hash. If the content of a commit is the same but the parents are different, the commit ID must be different. That means when comparing repositories (like in a push or pull) once Git finds a commit in common between the two repositories it can stop checking. This makes pushing and pulling extremely efficient. For example... <pre class="prettyprint"><code>origin A - B - C - D - E [master] A - B [origin/master] </code></pre> The network conversation for <code>git fetch origin</code> goes something like this... <ul> <li> <code>local</code> Hey origin, what branches do you have?</li> <li> <code>origin</code> I have master at E.</li> <li> <code>local</code> I don't have E, I have your master at B.</li> <li> <code>origin</code> B you say? I have B and it's an ancestor of E. That checks out. Let me send you C, D and E.</li> </ul> This is also why when you rewrite a commit with rebase, everything after it has to change. Here's an example. <pre class="prettyprint"><code>A - B - C - D - E - F - G [master] </code></pre> Let's say you rewrite D, just to change the log message a bit. Now D can no longer be D, it has to be copied to a new commit we'll call D1. <pre class="prettyprint"><code>A - B - C - D - E - F - G [master] \ D1 </code></pre> While D1 can have C as its parent (C is unaffected, commits do not know their children) it is disconnected from E, F and G. If we change E's parent to D1, E can't be E anymore. It has to be copied to a new commit E1. <pre class="prettyprint"><code>A - B - C - D - E - F - G [master] \ D1 - E1 </code></pre> And so on with F to F1 and G to G1. <pre class="prettyprint"><code>A - B - C - D - E - F - G \ D1 - E1 - F1 - G1 [master] </code></pre> They all have the same code, just different parents (or in D1's case, a different commit message).

You can see exactly what goes into making a commit id by running <pre class="prettyprint"><code>git cat-file commit HEAD </code></pre> It will give you something like <pre class="prettyprint"><code>tree 07e239f2f3d8adc12566eaf66e0ad670f36202b5 parent 543a4849f7201da7bed297b279b7b1e9a086a255 author Justin Howard <justin.howard@example.com> 1426631449 -0700 committer Justin Howard <justin.howard@example.com> 1426631471 -0700 My commit message </code></pre> It gives you: <ol> <li>A checksum of the tree contents</li> <li>The parent commit id (if this is a merge, there will be more parents)</li> <li>The author of the commit with timestamp</li> <li>The committer of the commit with timestamp</li> <li>The commit message</li> </ol> Git takes all this and does a sha1 hash of it. You can reproduce the commit id by running <pre class="prettyprint"><code>(printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum </code></pre> This starts out by printing the string <code>commit</code> followed by a space and the byte count of the <code>cat-file</code> text blob. It then adds the <code>cat-file</code> blob to that followed by a null byte. All of that then gets run through <code>sha1sum</code>. As you can see, there is nothing that identifies the project or repository in this information. The reason that this doesn't cause problems is because it is astronomically unlikely for two different commit hashes to collide.

What is a Git commit ID?

Video Answer

2 Answers

Here's an example of a commit object file, decompressed.

commit 238tree 0de83a78334c64250b18b5191f6cbd6b97e77f84
parent 6270c56bec8b3cf7468b5dd94168ac410eca1e98
author Michael G. Schwern <[email protected]> 1659644787 -0700
committer Michael G. Schwern <[email protected]> 1659644787 -0700

feature: I did something cool

The commit ID is a SHA-1 hash of that.

$ openssl zlib -d <  .git/objects/81/2e8c33de3f934cb70dfe711a5354edfd4e8172 | sha1sum 
812e8c33de3f934cb70dfe711a5354edfd4e8172  -

This includes...

Full content of the commit, not just the diff, represented as a tree object ID.
The ID of the previous commit (or commits if it's a merge).
Commit and author date.
Committer and author's name and email address.
Log message.

(The author is who originally wrote the commit, the committer is who made the commit. This is usually the same, but it can be different. For example, when you rebase or amend a commit. Or if you're committing someone else's patch they emailed to you and want to attribute the author.)

Change any of that and the commit ID changes. And yes, the same commit with the same properties will have the same ID on a different machine. This serves three purposes. First, it means the system can tell if a commit has been tampered with. It's baked right into the architecture.

Second, one can rapidly compare commits just by looking at their IDs. This makes Git's network protocols very efficient. Want to compare two commits to see if they're the same? Don't have to send the whole diff, just send the IDs.

Third, and this is the genius, two commits with the same IDs have the same history. That's why the ID of the previous commits are part of the hash. If the content of a commit is the same but the parents are different, the commit ID must be different. That means when comparing repositories (like in a push or pull) once Git finds a commit in common between the two repositories it can stop checking. This makes pushing and pulling extremely efficient. For example...

origin
A - B - C - D - E [master]

A - B [origin/master]

The network conversation for git fetch origin goes something like this...

local Hey origin, what branches do you have?
origin I have master at E.
local I don't have E, I have your master at B.
origin B you say? I have B and it's an ancestor of E. That checks out. Let me send you C, D and E.

This is also why when you rewrite a commit with rebase, everything after it has to change. Here's an example.

A - B - C - D - E - F - G [master]

Let's say you rewrite D, just to change the log message a bit. Now D can no longer be D, it has to be copied to a new commit we'll call D1.

A - B - C - D - E - F - G [master]
         \
          D1

While D1 can have C as its parent (C is unaffected, commits do not know their children) it is disconnected from E, F and G. If we change E's parent to D1, E can't be E anymore. It has to be copied to a new commit E1.

A - B - C - D - E - F - G [master]
         \
          D1 - E1

And so on with F to F1 and G to G1.

A - B - C - D - E - F - G
         \
          D1 - E1 - F1 - G1 [master]

They all have the same code, just different parents (or in D1's case, a different commit message).

162

answered Oct 06 '22 23:10

Schwern

You can see exactly what goes into making a commit id by running

git cat-file commit HEAD

It will give you something like

tree 07e239f2f3d8adc12566eaf66e0ad670f36202b5
parent 543a4849f7201da7bed297b279b7b1e9a086a255
author Justin Howard <[email protected]> 1426631449 -0700
committer Justin Howard <[email protected]> 1426631471 -0700

My commit message

It gives you:

A checksum of the tree contents
The parent commit id (if this is a merge, there will be more parents)
The author of the commit with timestamp
The committer of the commit with timestamp
The commit message

Git takes all this and does a sha1 hash of it. You can reproduce the commit id by running

(printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum

This starts out by printing the string commit followed by a space and the byte count of the cat-file text blob. It then adds the cat-file blob to that followed by a null byte. All of that then gets run through sha1sum.

As you can see, there is nothing that identifies the project or repository in this information. The reason that this doesn't cause problems is because it is astronomically unlikely for two different commit hashes to collide.

answered Oct 06 '22 23:10

Justin Howard

Related questions
                            
                                GitFlow: safely merge develop changes to a feature branch
                            
                                Using ediff as git mergetool
                            
                                git equivalent to hg mq?
                            
                                How to Configure Capistrano to Deploy from Local Git Repository?
                            
                                Git how to checkout a commit of a branch
                            
                                Automatically remove *.pyc files and otherwise-empty directories when I check out a new branch
                            
                                How do I merge a pull request on someone else's project in git?
                            
                                How to programmatically determine whether the Git checkout is a tag and if so, what is the tag name
                            
                                'receive-pack': service not enabled for './.git'
                            
                                git: Why doesn't git diff show any differences?
                            
                                Github API - create branch?
                            
                                BitBucket: Host key verification failed
                            
                                Best practices for cross platform git config?
                            
                                How to 'Watch' only a directory in a GitHub repository?
                            
                                git push heroku master permission denied
                            
                                How to install a bower package using a private git server (SSH)?
                            
                                How can I 'git clone' from another machine?
                            
                                Merge GIT branch without commit log
                            
                                How do I export my project as a .zip of git repository?
                            
                                Git: Who has modified this line?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a Git commit ID?

Tags:

git

git-commit

uniqueidentifier

git-svn

Ankur Loriya

People also ask

Video Answer

2 Answers

Schwern

Justin Howard

Recent Activity

Donate For Us