I'm trying to understand how Git calculates the hash of refs.
$ git ls-remote https://github.com/git/git .... 29932f3915935d773dc8d52c292cadd81c81071d refs/tags/v2.4.2 9eabf5b536662000f79978c4d1b6e4eff5c8d785 refs/tags/v2.4.2^{} ....
Clone the repo locally. Check the refs/tags/v2.4.2^{}
ref by sha
$ git cat-file -p 9eabf5b536662000f79978c4d1b6e4eff5c8d785 tree 655a20f99af32926cbf6d8fab092506ddd70e49c parent df08eb357dd7f432c3dcbe0ef4b3212a38b4aeff author Junio C Hamano <[email protected]> 1432673399 -0700 committer Junio C Hamano <[email protected]> 1432673399 -0700 Git 2.4.2 Signed-off-by: Junio C Hamano <[email protected]>
Copy the decompressed content so that we can hash it. (AFAIK Git uses the uncompressed version when it's hashing)
git cat-file -p 9eabf5b536662000f79978c4d1b6e4eff5c8d785 > fi
Let's SHA-1 the content using Git's own hash command
git hash-object fi 3cf741bbdbcdeed65e5371912742e854a035e665
Why is the output not [9e]abf5b536662000f79978c4d1b6e4eff5c8d785
? I understand the first two characters (9e
) are the length in hex. How should I hash the content of fi
so that I can get the Git ref abf5b536662000f79978c4d1b6e4eff5c8d785
?
In Git, get the tree hash with: git cat-file commit HEAD | head -n1. The commit hash by hashing the data you see with cat-file . This includes the tree object hash and commit information like author, time, commit message, and the parent commit hash if it's not the first commit.
The algorithm that Git uses is the SHA-1 hash algorithm which basically is a cryptographic hash function taking input and producing a 160-bit (20-byte) hash value.
A git commit hash is a cryptographic checksum that is calculated from the state of your repository, including the hash of all the files in the repository, the hash of the previous commit, the current date and time, etc. It is not possible to specify this manually.
A commit in git always has a hash that contains 40 characters. But to make the id:s easier to handle it also supports using a short version of the id. The short commit id can actually be any number of characters as long as it's unique for a commit within the same repo.
As described in "How is git commit sha1 formed ", the formula is:
(printf "<type> %s\0" $(git cat-file <type> <ref> | wc -c); git cat-file <type> <ref>)|sha1sum
In the case of the commit 9eabf5b536662000f79978c4d1b6e4eff5c8d785 (which is v2.4.2^{}
, and which referenced a tree) :
(printf "commit %s\0" $(git cat-file commit 9eabf5b536662000f79978c4d1b6e4eff5c8d785 | wc -c); git cat-file commit 9eabf5b536662000f79978c4d1b6e4eff5c8d785 )|sha1sum
That will give 9eabf5b536662000f79978c4d1b6e4eff5c8d785.
As would:
(printf "commit %s\0" $(git cat-file commit v2.4.2{} | wc -c); git cat-file commit v2.4.2{})|sha1sum
(still 9eabf5b536662000f79978c4d1b6e4eff5c8d785)
Similarly, computing the SHA1 of the tag v2.4.2 would be:
(printf "tag %s\0" $(git cat-file tag v2.4.2 | wc -c); git cat-file tag v2.4.2)|sha1sum
That would give 29932f3915935d773dc8d52c292cadd81c81071d.
There's bit of confusion here. Git uses different types of objects: blobs, trees and commits. The following command:
git cat-file -t <hash>
Tells you the type of object for a given hash. So in your example, the hash 9eabf5b536662000f79978c4d1b6e4eff5c8d785 corresponds to a commit object.
Now, as you figured out yourself, running this:
git cat-file -p 9eabf5b536662000f79978c4d1b6e4eff5c8d785
Gives you the content of the object according to its type (in this instance, a commit).
But, this:
git hash-object fi
...computes the hash for a blob whose content is the output of the previous command (in your example), but it could be anything else (like "hello world!"). Here try this:
echo "blob 277\0$(cat fi)" | shasum
The output is the same as the previous command. This is basically how Git hashes a blob. So by hashing fi, you are generating a blob object. But as we have seen, 9eabf5b536662000f79978c4d1b6e4eff5c8d785 is a commit, not a blob. So, you cannot hash fi as it is in order to get the same hash.
A commit's hash is based on several other informations which makes it unique (such as the committer, the author, the date, etc). The following article tells you exactly what a commit hash is made of:
The anatomy of a git commit
So you could get the same hash by providing all the data specified in the article with the exact same values as those used in the original commit.
This might be helpful as well:
Git from the bottom up
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With