Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would happen if two Git commits had the same SHA-1 hash?

Tags:

git

Let me prefix this by saying that I am aware of the extremely minuscule odds of this happening. I know that it would be more or less impossible to manufacture, and extremely unlikely to happen "in the wild." This is simply a what-if question about the internals of Git.

So, here is my question: what would happen if two Git commit hashes were identical? For starters:

  • Would the commit succeed?
  • Could it later be checked out as a detached head?
  • Would a subsequent commit be possible?
like image 702
Ben Avatar asked Sep 16 '15 22:09

Ben


People also ask

What happens if two GIT commits have the same hash?

If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can't know which object the name was meant to point to. Two objects colliding accidentally is exceedingly unlikely.

Can two commits have the same hash?

Hashes are what enable Git to share data efficiently between repositories. If two files are the same, their hashes are guaranteed to be the same. Similarly, if two commits contain the same files and have the same ancestors, their hashes will be the same as well.

Does git generate SHA key for every commit?

Git uses the following information to generate the sha-1: The source tree of the commit (which unravels to all the subtrees and blobs) The parent commit sha1.

What is SHA hash in git?

It uses the SHA-1 hash function to name content. For example, files, directories, and revisions are referred to by hash values unlike in other traditional version control systems where files or versions are referred to via sequential numbers.


2 Answers

My old answer "How would git handle a SHA-1 collision on a blob?" would still apply, even for a commit and not a blob.
As torek mentions in the comments, git just thinks of everything as "objects", each with their own SHA1.

https://git-scm.com/book/en/v2/book/10-git-internals/images/data-model-4.png

(Image from Git Internals - Git References chapter of the ProGit Book v2)

While the commit would likely not succeed (there are a couple of checks in git-commit-tree.c), you also have to consider the case where two commits with the same SHA1 (and somehow different content) are created in repos A and B... and repo A is fetching repo B!
Commit 8685da4 (March 2007, git 1.5.1) took care of that, and the fetch would fail.
Commit 0e8189e (Oct. 2008, git 1.6.1) does mention that, with index V2:

the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero.

It still implements a packed object CRC check when unpacking objects.

The Git code mentioned below is the finalize_object_file() function, and a blame shows no recent modification, most of the code dating back from the very beginning of Git (2005): no new commit is created.

like image 132
VonC Avatar answered Oct 27 '22 00:10

VonC


According to the source code (present in git v2.17), if a commit lead to an already existing sha1, this is what would happen on Linux (for other operating systems it might be different).

Would the commit succeed?

Yes and no: the git commit command would return as if in success, but the new commit object would not be created.

Could it later be checked out as a detached head?

No.

Reference : file sha1-file.c (commit fc1395f4a491a7da46a446233531005634eb979d)

int finalize_object_file(const char *tmpfile, const char *filename)
{
    int ret = 0;

    if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
        goto try_rename;
    else if (link(tmpfile, filename))
        ret = errno;

    /*
     * Coda hack - coda doesn't like cross-directory links,
     * ...
     */
    if (ret && ret != EEXIST) {
    try_rename:
        if (!rename(tmpfile, filename))
            goto out;
        ret = errno;
    }
    unlink_or_warn(tmpfile);
    if (ret) {
        if (ret != EEXIST) {
            return error_errno("unable to write sha1 filename %s", filename);
        }
        /* FIXME!!! Collision check here ? */
    }

out:
    if (adjust_shared_perm(filename))
        return error("unable to set permission to '%s'", filename);
    return 0;
}

The link fails with EEXIST, the temporary file is removed, and the code continues until the return 0 (through the FIXME, and the adjust_shared_perm() which has no reason to fail).

like image 35
user803422 Avatar answered Oct 27 '22 00:10

user803422