Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the Git hash calculated?

Tags:

git

hash

I'm trying to understand how Git calculates the hash of refs.

$ git ls-remote https://github.com/git/git    .... 29932f3915935d773dc8d52c292cadd81c81071d    refs/tags/v2.4.2 9eabf5b536662000f79978c4d1b6e4eff5c8d785    refs/tags/v2.4.2^{} .... 

Clone the repo locally. Check the refs/tags/v2.4.2^{} ref by sha

$ git cat-file -p 9eabf5b536662000f79978c4d1b6e4eff5c8d785   tree 655a20f99af32926cbf6d8fab092506ddd70e49c parent df08eb357dd7f432c3dcbe0ef4b3212a38b4aeff author Junio C Hamano <[email protected]> 1432673399 -0700 committer Junio C Hamano <[email protected]> 1432673399 -0700  Git 2.4.2  Signed-off-by: Junio C Hamano <[email protected]> 

Copy the decompressed content so that we can hash it. (AFAIK Git uses the uncompressed version when it's hashing)

git cat-file -p 9eabf5b536662000f79978c4d1b6e4eff5c8d785 > fi 

Let's SHA-1 the content using Git's own hash command

git hash-object fi 3cf741bbdbcdeed65e5371912742e854a035e665 

Why is the output not [9e]abf5b536662000f79978c4d1b6e4eff5c8d785? I understand the first two characters (9e) are the length in hex. How should I hash the content of fi so that I can get the Git ref abf5b536662000f79978c4d1b6e4eff5c8d785?

like image 852
The user with no hat Avatar asked Feb 16 '16 10:02

The user with no hat


People also ask

How are Git hashes calculated?

In Git, get the tree hash with: git cat-file commit HEAD | head -n1. The commit hash by hashing the data you see with cat-file . This includes the tree object hash and commit information like author, time, commit message, and the parent commit hash if it's not the first commit.

What is Git hash value?

The algorithm that Git uses is the SHA-1 hash algorithm which basically is a cryptographic hash function taking input and producing a 160-bit (20-byte) hash value.

What is a Git commit hash based on?

A git commit hash is a cryptographic checksum that is calculated from the state of your repository, including the hash of all the files in the repository, the hash of the previous commit, the current date and time, etc. It is not possible to specify this manually.

How big is a Git hash?

A commit in git always has a hash that contains 40 characters. But to make the id:s easier to handle it also supports using a short version of the id. The short commit id can actually be any number of characters as long as it's unique for a commit within the same repo.


Video Answer


2 Answers

As described in "How is git commit sha1 formed ", the formula is:

(printf "<type> %s\0" $(git cat-file <type> <ref> | wc -c); git cat-file <type> <ref>)|sha1sum 

In the case of the commit 9eabf5b536662000f79978c4d1b6e4eff5c8d785 (which is v2.4.2^{}, and which referenced a tree) :

(printf "commit %s\0" $(git cat-file commit 9eabf5b536662000f79978c4d1b6e4eff5c8d785 | wc -c); git cat-file commit 9eabf5b536662000f79978c4d1b6e4eff5c8d785 )|sha1sum 

That will give 9eabf5b536662000f79978c4d1b6e4eff5c8d785.

As would:

(printf "commit %s\0" $(git cat-file commit v2.4.2{} | wc -c); git cat-file commit v2.4.2{})|sha1sum 

(still 9eabf5b536662000f79978c4d1b6e4eff5c8d785)

Similarly, computing the SHA1 of the tag v2.4.2 would be:

(printf "tag %s\0" $(git cat-file tag v2.4.2 | wc -c); git cat-file tag v2.4.2)|sha1sum 

That would give 29932f3915935d773dc8d52c292cadd81c81071d.

like image 147
VonC Avatar answered Oct 05 '22 16:10

VonC


There's bit of confusion here. Git uses different types of objects: blobs, trees and commits. The following command:

git cat-file -t <hash> 

Tells you the type of object for a given hash. So in your example, the hash 9eabf5b536662000f79978c4d1b6e4eff5c8d785 corresponds to a commit object.

Now, as you figured out yourself, running this:

git cat-file -p 9eabf5b536662000f79978c4d1b6e4eff5c8d785 

Gives you the content of the object according to its type (in this instance, a commit).

But, this:

git hash-object fi 

...computes the hash for a blob whose content is the output of the previous command (in your example), but it could be anything else (like "hello world!"). Here try this:

echo "blob 277\0$(cat fi)" | shasum 

The output is the same as the previous command. This is basically how Git hashes a blob. So by hashing fi, you are generating a blob object. But as we have seen, 9eabf5b536662000f79978c4d1b6e4eff5c8d785 is a commit, not a blob. So, you cannot hash fi as it is in order to get the same hash.

A commit's hash is based on several other informations which makes it unique (such as the committer, the author, the date, etc). The following article tells you exactly what a commit hash is made of:

The anatomy of a git commit

So you could get the same hash by providing all the data specified in the article with the exact same values as those used in the original commit.

This might be helpful as well:

Git from the bottom up

like image 32
Sulli Avatar answered Oct 05 '22 14:10

Sulli