Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would Git handle a SHA-1 collision on a blob?

This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?

More a brain-teaser than an actual problem, but I found the issue interesting.

like image 361
Gnurou Avatar asked Feb 22 '12 09:02

Gnurou


People also ask

What happens if Git hash collision?

It will fail when creating the commit. If a tree object already exists and you make a commit object with the same hash, it will fail when updating the "ref".

Does Git still use SHA-1?

GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits. It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one.

Does SHA-1 have collisions?

“We note that classical collisions and chosen-prefix collisions do not threaten all usages of SHA-1." There are several potential scenarios in which the new collision could be implemented in an attack, the most likely of which is someone impersonating another user by creating an identical PGP key.

Does Git use SHA-1 or sha256?

At its core, the Git version control system is a content addressable filesystem. It uses the SHA-1 hash function to name content.


1 Answers

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c +++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c @@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou     blk_SHA1_Update(ctx, padlen, 8);      /* Output hash */ -   for (i = 0; i < 5; i++) -       put_be32(hashout + i * 4, ctx->H[i]); +   for (i = 0; i < 1; i++) +       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000)); +   for (i = 1; i < 5; i++) +       put_be32(hashout + i * 4, 0);  } 

Then I did a few commits and noticed the following.

  1. If a blob already exists with the same hash, you will not get any warnings at all. Everything seems to be ok, but when you push, someone clones, or you revert, you will lose the latest version (in line with what is explained above).
  2. If a tree object already exists and you make a blob with the same hash: Everything will seem normal, until you either try to push or someone clones your repository. Then you will see that the repo is corrupt.
  3. If a commit object already exists and you make a blob with the same hash: same as #2 - corrupt
  4. If a blob already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  5. If a blob already exists and you make a tree object with the same hash. It will fail when creating the commit.
  6. If a tree object already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  7. If a tree object already exists and you make a tree object with the same hash, everything will seem ok. But when you commit, all of the repository will reference the wrong tree.
  8. If a commit object already exists and you make a commit object with the same hash, everything will seem ok. But when you commit, the commit will never be created, and the HEAD pointer will be moved to an old commit.
  9. If a commit object already exists and you make a tree object with the same hash, it will fail when creating the commit.

For #2 you will typically get an error like this when you run "git push":

error: object 0400000000000000000000000000000000000000 is a tree, not a blob fatal: bad blob object error: failed to push some refs to origin 

or:

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000) 

if you delete the file and then run "git checkout file.txt".

For #4 and #6, you will typically get an error like this:

error: Trying to write non-commit object f000000000000000000000000000000000000000 to branch refs/heads/master fatal: cannot update HEAD ref 

when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)

For #5 and #9, you will typically get an error like this:

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object 

when running "git commit"

If someone tries to clone your corrupt repository, they will typically see something like:

git clone (one repo with collided blob, d000000000000000000000000000000000000000 is commit, f000000000000000000000000000000000000000 is tree)  Cloning into 'clonedversion'... done. error: unable to read sha1 file of s (d000000000000000000000000000000000000000) error: unable to read sha1 file of tullebukk (f000000000000000000000000000000000000000) fatal: unable to checkout working tree warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry the checkout with 'git checkout -f HEAD' 

What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.

like image 109
rubund Avatar answered Oct 10 '22 13:10

rubund