Let me prefix this by saying that I am aware of the extremely minuscule odds of this happening. I know that it would be more or less impossible to manufacture, and extremely unlikely to happen "in the wild." This is simply a what-if question about the internals of Git.
So, here is my question: what would happen if two Git commit hashes were identical? For starters:
If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can't know which object the name was meant to point to. Two objects colliding accidentally is exceedingly unlikely.
Hashes are what enable Git to share data efficiently between repositories. If two files are the same, their hashes are guaranteed to be the same. Similarly, if two commits contain the same files and have the same ancestors, their hashes will be the same as well.
Git uses the following information to generate the sha-1: The source tree of the commit (which unravels to all the subtrees and blobs) The parent commit sha1.
It uses the SHA-1 hash function to name content. For example, files, directories, and revisions are referred to by hash values unlike in other traditional version control systems where files or versions are referred to via sequential numbers.
My old answer "How would git handle a SHA-1 collision on a blob?" would still apply, even for a commit and not a blob.
As torek mentions in the comments, git just thinks of everything as "objects", each with their own SHA1.
(Image from Git Internals - Git References chapter of the ProGit Book v2)
While the commit would likely not succeed (there are a couple of checks in git-commit-tree.c
), you also have to consider the case where two commits with the same SHA1 (and somehow different content) are created in repos A and B... and repo A is fetching repo B!
Commit 8685da4 (March 2007, git 1.5.1) took care of that, and the fetch would fail.
Commit 0e8189e (Oct. 2008, git 1.6.1) does mention that, with index V2:
the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero.
It still implements a packed object CRC check when unpacking objects.
The Git code mentioned below is the finalize_object_file()
function, and a blame shows no recent modification, most of the code dating back from the very beginning of Git (2005): no new commit is created.
According to the source code (present in git v2.17), if a commit lead to an already existing sha1, this is what would happen on Linux (for other operating systems it might be different).
Would the commit succeed?
Yes and no: the git commit
command would return as if in success, but the new commit object would not be created.
Could it later be checked out as a detached head?
No.
Reference :
file sha1-file.c (commit fc1395f4a491a7da46a446233531005634eb979d
)
int finalize_object_file(const char *tmpfile, const char *filename)
{
int ret = 0;
if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
goto try_rename;
else if (link(tmpfile, filename))
ret = errno;
/*
* Coda hack - coda doesn't like cross-directory links,
* ...
*/
if (ret && ret != EEXIST) {
try_rename:
if (!rename(tmpfile, filename))
goto out;
ret = errno;
}
unlink_or_warn(tmpfile);
if (ret) {
if (ret != EEXIST) {
return error_errno("unable to write sha1 filename %s", filename);
}
/* FIXME!!! Collision check here ? */
}
out:
if (adjust_shared_perm(filename))
return error("unable to set permission to '%s'", filename);
return 0;
}
The link fails with EEXIST, the temporary file is removed, and the code continues until the return 0
(through the FIXME, and the adjust_shared_perm()
which has no reason to fail).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With