At its core, the Git version control system is a content addressable filesystem. It uses the SHA-1 hash function to name content.
As is well-known, Git has been using SHA-1 to calculate a hash for each commit: For example, files, directories, and revisions are referred to by hash values unlike in other traditional version control systems where files or versions are referred to via sequential numbers.
The Git Version Control System uses SHA-1 checksums on the contents of all change commits. In fact, the checksum is used as commit identifier and commonly referred to as "the SHA". Git's checksums include meta data about the commit including the author, date, and the previous commit's SHA.
Git prefixes the object with "blob ", followed by the length (as a human-readable integer), followed by a NUL character
$ echo -e 'blob 14\0Hello, World!' | shasum
8ab686eafeb1f44702738c8b0f24f2567c36da6d
Source: http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html
I am only expanding on the answer by @Leif Gruenwoldt
and detailing what is in the reference provided by @Leif Gruenwoldt
Do It Yourself..
- Step 1. Create an empty text document (name does not matter) in your repository
- Step 2. Stage and Commit the document
- Step 3. Identify the hash of the blob by executing
git ls-tree HEAD
- Step 4. Find the blob's hash to be
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
- Step 5. Snap out of your surprise and read below
How does GIT compute its commit hashes
Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "\0" + <contents_of_file>)
The text blob⎵
is a constant prefix and \0
is also constant and is the NULL
character. The <size_of_file>
and <contents_of_file>
vary depending on the file.
See: What is the file format of a git commit object?
And thats all folks!
But wait!, did you notice that the <filename>
is not a parameter used for the hash computation? Two files could potentially have the same hash if their contents are same indifferent of the date and time they were created and their name. This is one of the reasons Git handles moves and renames better than other version control systems.
Do It Yourself (Ext)
- Step 6. Create another empty file with a different
filename
in the same directory- Step 7. Compare the hashes of both your files.
Note:
The link does not mention how the tree
object is hashed. I am not certain of the algorithm and parameters however from my observation it probably computes a hash based on all the blobs
and trees
(their hashes probably) it contains
git hash-object
This is a quick way to verify your test method:
s='abc'
printf "$s" | git hash-object --stdin
printf "blob $(printf "$s" | wc -c)\0$s" | sha1sum
Output:
f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f
f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f -
where sha1sum
is in GNU Coreutils.
Then it comes down to understanding the format of each object type. We have already covered the trivial blob
, here are the others:
Based on Leif Gruenwoldt answer, here is a shell function substitute to git hash-object
:
git-hash-object () { # substitute when the `git` command is not available
local type=blob
[ "$1" = "-t" ] && shift && type=$1 && shift
# depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
# by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
local size=$(cat $1 | wc -c | sed 's/ .*$//')
( echo -en "$type $size\0"; cat "$1" ) | sha1sum | sed 's/ .*$//'
}
Test:
$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
I needed this for some unit tests in Python 3 so thought I'd leave it here.
def git_blob_hash(data):
if isinstance(data, str):
data = data.encode()
data = b'blob ' + str(len(data)).encode() + b'\0' + data
h = hashlib.sha1()
h.update(data)
return h.hexdigest()
I stick to \n
line endings everywhere but in some circumstances Git might also be changing your line endings before calculating this hash so you may need a .replace('\r\n', '\n')
in there too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With