Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Git use the SHA1 of the *compressed* objects rather than the SHA1 of the original objects?

I'm just curious as to why this choice was made - it basically rules out changing the compression algorithm used by Git - because it doesn't use the SHA1 of the raw blobs. Perhaps there is some efficiency consideration here. Maybe ZLIB is faster at compressing a file than the SHA1 algorithm is at creating the hash, so therefore compressing before hashing is faster?

Here is a link to the original Git READMEby Linus: http://git.kernel.org/?p=git/git.git;a=blob;f=README;h=27577f76849c09d3405397244eb3d8ae1d11b0f3;hb=e83c5163316f89bfbde7d9ab23ca2e25604af290

And here is the relavent paragraph:

"There are several kinds of objects in the content-addressable collection database. They are all in deflated with zlib, and start off with a tag of their type, and size information about the data. The SHA1 hash is always the hash of the compressed object, not the original one."

like image 828
jds Avatar asked Nov 26 '11 03:11

jds


People also ask

Why does Git use SHA-1?

GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits. It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one.

Does Git use SHA-1 or sha256?

At its core, the Git version control system is a content addressable filesystem. It uses the SHA-1 hash function to name content.

What is SHA ID in Git?

SHA1 is a cryptographic hash function, which means that given the data, it will creates a 40-digit hexadecimal number (the ones you can see when you do git log ). SHA1 function will guarantee same output for same input.

What compression algorithm does Git use?

Myers Algorithm – human readable diffs This is used by tools such as Git Diff and GNU Diff.


1 Answers

Like you said, it is the original README, when Git was started. Since then, it has been changed so that the SHA1 is computed before compressing.

It’s worth noting that the SHA-1 hash that is used to name the object is the hash of the original data plus this header, so 'sha1sum' file does not match the object name for file. (Historical note: in the dawn of the age of git the hash was the SHA-1 of the compressed object.)

http://schacon.github.com/git/user-manual.html#object-details

like image 119
manojlds Avatar answered Nov 15 '22 22:11

manojlds