Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you get a duplicate hash in Git in any way and what are the implications

Tags:

git

My opinion is that it should be possible to get a duplicate git hash because the hash code is a condensed representation of uniqueness thus there will be some sequence of steps that produces the same hash code. More importantly there should be a sequence of steps where different changes are committed yet produce the same hash code.

For example clone the same repository twice on the same machine, making almost the same exact change (save one byte or bit) in the different repositories and committing. Even if the directory name or timestamp is used in the commit, it should still be possible to get this (though granted rare). For example two separate people on two different machines making a commit at the same time.

My question is two fold. How can this happen and how will Git handle it.

Or more explicitly how does git ensure you are up to date before a push. Is it possible that one person pushes first, then the other tries to push (both changes based off of the same parent commit) and Git sees that the hash codes match from the remote and local history, decides you are good to go, allows your push but you just lost one of your changes? In this situation i see it more like the following:

repo1 a->b->c1

repo2 a->b->c1'->c2

say c1,c1',c2 all happen after both repos were cloned at b, now repo1 pushes, no problems now repo2 attempts to push c1' and c2 and git determines that c1' = c1 but in fact they differ, git pushes c2 ontop of c1 to get a->b->c1->c2 and we lost the change made in c1'

Is this possible? how could it happen and what would git do?

like image 747
Coder Avatar asked Aug 02 '12 12:08

Coder


1 Answers

With regard to the part of your question that relates to duplicate hashes:

Git relies completely on the uniqueness of the hashes that are generated, and as far as I know has no safeguards to handle different data blobs yielding the same hash value. However, the chances of a hash collision occurring are vanishingly small and in practice can be ignored. If you are still worried, this section from Pro Git may put your mind at rest:

A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

As for the second part of your question (what happens):

If you do happen to commit an object that hashes to the same SHA-1 value as a previous object in your repository, Git will see the previous object already in your Git database and assume it was already written. If you try to check out that object again at some point, you’ll always get the data of the first object.

like image 145
codebox Avatar answered Oct 16 '22 22:10

codebox