Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Git(Hub) handle possible collisions from short SHAs?

Both Git and GitHub display short versions of SHAs -- just the first 7 characters instead of all 40 -- and both Git and GitHub support taking these short SHAs as arguments.

E.g. git show 962a9e8

E.g. https://github.com/joyent/node/commit/962a9e8

Given that the possibility space is now orders of magnitude lower, "just" 268 million, how do Git and GitHub protect against collisions here? And how do they handle them?

like image 365
Aseem Kishore Avatar asked Aug 19 '11 23:08

Aseem Kishore


People also ask

Does Git check for hash collisions?

To answer your question, yes, git will treat them as the same, changing the hash algorithm won't help, it'll take a "second check" of some sort, but ultimately, you would need as much "additional check" data as the length of the data to be 100% sure...

What happens if Git hashes collide?

If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can't know which object the name was meant to point to. Two objects colliding accidentally is exceedingly unlikely.

Why does Git still use sha1?

GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits. It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one.

How many characters are used in Sha key in github?

Generally, eight to ten characters are more than enough to be unique within a project. One of the largest Git projects, the Linux kernel, is beginning to need 12 characters out of the possible 40 to stay unique. 7 digits are the Git default for a short SHA, so that's fine for most projects.


2 Answers

These short forms are just to simplify visual recognition and to make your life easier. Git doesn't really truncate anything, internally everything will be handled with the complete value. You can use a partial SHA-1 at your convenience, though:

Git is smart enough to figure out what commit you meant to type if you provide the first few characters, as long as your partial SHA-1 is at least four characters long and unambiguous — that is, only one object in the current repository begins with that partial SHA-1.

like image 57
emboss Avatar answered Oct 07 '22 15:10

emboss


I have a repository that has a commit with an id of 000182eacf99cde27d5916aa415921924b82972c.

git show 00018 

shows the revision, but

git show 0001 

prints

error: short SHA1 0001 is ambiguous. error: short SHA1 0001 is ambiguous. fatal: ambiguous argument '0001': unknown revision or path not in the working tree. Use '--' to separate paths from revisions 

(If you're curious, it's a clone of the git repository for git itself; that commit is one that Linus Torvalds made in 2005.)

like image 43
Keith Thompson Avatar answered Oct 07 '22 13:10

Keith Thompson