Let me prefix this by saying that I am aware of the extremely minuscule odds of this happening. I know that it would be more or less impossible to manufacture, and extremely unlikely to happen "in the wild." This is simply a what-if question about the internals of Git. So, here is my question: what would happen if two Git commit hashes were identical? For starters: <ul> <li>Would the commit succeed?</li> <li>Could it later be checked out as a detached head?</li> <li>Would a subsequent commit be possible?</li> </ul>

My old answer "How would git handle a SHA-1 collision on a blob?" would still apply, even for a commit and not a blob. As torek mentions in the comments, git just thinks of everything as "objects", each with their own SHA1. <img src="https://i.stack.imgur.com/Ay0sH.png" alt="https://git-scm.com/book/en/v2/book/10-git-internals/images/data-model-4.png"> (Image from Git Internals - Git References chapter of the ProGit Book v2) While the commit would likely not succeed (there are a couple of checks in <code>git-commit-tree.c</code>), you also have to consider the case where two commits with the same SHA1 (and somehow different content) are created in repos A and B... and repo A is fetching repo B! Commit 8685da4 (March 2007, git 1.5.1) took care of that, and the fetch would fail. Commit 0e8189e (Oct. 2008, git 1.6.1) does mention that, with index V2: <blockquote> the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. </blockquote> It still implements a packed object CRC check when unpacking objects. The Git code mentioned below is the <code>finalize_object_file()</code> function, and a blame shows no recent modification, most of the code dating back from the very beginning of Git (2005): no new commit is created.

What would happen if two Git commits had the same SHA-1 hash?

Tags:

git

Let me prefix this by saying that I am aware of the extremely minuscule odds of this happening. I know that it would be more or less impossible to manufacture, and extremely unlikely to happen "in the wild." This is simply a what-if question about the internals of Git.

So, here is my question: what would happen if two Git commit hashes were identical? For starters:

Would the commit succeed?
Could it later be checked out as a detached head?
Would a subsequent commit be possible?

702

asked Sep 16 '15 22:09

Ben

2 Answers

My old answer "How would git handle a SHA-1 collision on a blob?" would still apply, even for a commit and not a blob.
As torek mentions in the comments, git just thinks of everything as "objects", each with their own SHA1.

^{(Image from Git Internals - Git References chapter of the ProGit Book v2)}

While the commit would likely not succeed (there are a couple of checks in git-commit-tree.c), you also have to consider the case where two commits with the same SHA1 (and somehow different content) are created in repos A and B... and repo A is fetching repo B!
Commit 8685da4 (March 2007, git 1.5.1) took care of that, and the fetch would fail.
Commit 0e8189e (Oct. 2008, git 1.6.1) does mention that, with index V2:

the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero.

It still implements a packed object CRC check when unpacking objects.

The Git code mentioned below is the finalize_object_file() function, and a blame shows no recent modification, most of the code dating back from the very beginning of Git (2005): no new commit is created.

132

answered Oct 27 '22 00:10

VonC

According to the source code (present in git v2.17), if a commit lead to an already existing sha1, this is what would happen on Linux (for other operating systems it might be different).

Would the commit succeed?

Yes and no: the git commit command would return as if in success, but the new commit object would not be created.

Could it later be checked out as a detached head?

No.

Reference : file sha1-file.c (commit fc1395f4a491a7da46a446233531005634eb979d)

int finalize_object_file(const char *tmpfile, const char *filename)
{
    int ret = 0;

    if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
        goto try_rename;
    else if (link(tmpfile, filename))
        ret = errno;

    /*
     * Coda hack - coda doesn't like cross-directory links,
     * ...
     */
    if (ret && ret != EEXIST) {
    try_rename:
        if (!rename(tmpfile, filename))
            goto out;
        ret = errno;
    }
    unlink_or_warn(tmpfile);
    if (ret) {
        if (ret != EEXIST) {
            return error_errno("unable to write sha1 filename %s", filename);
        }
        /* FIXME!!! Collision check here ? */
    }

out:
    if (adjust_shared_perm(filename))
        return error("unable to set permission to '%s'", filename);
    return 0;
}

The link fails with EEXIST, the temporary file is removed, and the code continues until the return 0 (through the FIXME, and the adjust_shared_perm() which has no reason to fail).

answered Oct 27 '22 00:10

user803422

Related questions
                            
                                Easy way to setup GitLab with existing Gitolite configuration
                            
                                How to collaborate in a project using Git without using Github?
                            
                                Per-directory permissions on git
                            
                                Who touched my git assume-unchanged bit?
                            
                                Removing features from releases in Git Flow
                            
                                Possible to enable the word-diff option in github to see more granular changes to a line?
                            
                                fatal: could not read Username for, No such device
                            
                                Pushing only one folder to remote repo
                            
                                VSCode automatically opens Git shell in terminal - how to disable it?
                            
                                git log history simplification
                            
                                How to make git-config to use spaces instead of tabs
                            
                                git submodule update is slow. How can I debug why it's slow?
                            
                                git describe with two tags on the same commit
                            
                                How to conveniently sync a file between two git repositories
                            
                                Why does git call me "clever" when I reword the last commit message?
                            
                                What are typical use cases of git-reset's --merge and --keep flags?
                            
                                Powerful gitk/gitg alternative?
                            
                                Add a git merge driver to the repository?
                            
                                Why does git add&remove the Storyboard <classes> section repeatedly?
                            
                                Setting default diff algorithm does not translate to default merge algorithm (patience)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With