Context: I downloaded a file (Audirvana 0.7.1.zip) from code.google to my Macbook Pro (Mac OS X 10.6.6). I wanted to verify the checksum, which for that particular file is posted as 862456662a11e2f386ff0b24fdabcb4f6c1c446a (SHA-1). <code>git hash-object</code> gave me a different hash, but <code>openssl sha1</code> returned the expected 862456662a11e2f386ff0b24fdabcb4f6c1c446a. The following experiment seems to rule out any possible download corruption or newline differences and to indicate that there are actually two different algorithms at play: <pre class="prettyprint"><code>$ echo A > foo.txt $ cat foo.txt A $ git hash-object foo.txt f70f10e4db19068f79bc43844b49f3eece45c4e8 $ openssl sha1 foo.txt SHA1(foo.txt)= 7d157d7c000ae27db146575c08ce30df893d3a64 </code></pre> What's going on?

You see a difference because <code>git hash-object</code> doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow: <ul> <li>How to assign a Git SHA1's to a file without Git?</li> </ul> Or, to convince yourself, try something like: <pre class="prettyprint"><code>$ echo -n hello | git hash-object --stdin b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0 $ printf 'blob 5\0hello' > test.txt $ openssl sha1 test.txt SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0 </code></pre>

The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte. So: <pre class="prettyprint"><code>$ git hash-object foo.txt f70f10e4db19068f79bc43844b49f3eece45c4e8 $ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \ && cat foo.txt ) | openssl sha1 f70f10e4db19068f79bc43844b49f3eece45c4e8 </code></pre> One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is: e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file" 4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory" You will find that you can in fact do <code>git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904</code> in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.

Why does git hash-object return a different hash than openssl sha1?

Tags:

git

openssl

sha1

Context: I downloaded a file (Audirvana 0.7.1.zip) from code.google to my Macbook Pro (Mac OS X 10.6.6).

I wanted to verify the checksum, which for that particular file is posted as 862456662a11e2f386ff0b24fdabcb4f6c1c446a (SHA-1). git hash-object gave me a different hash, but openssl sha1 returned the expected 862456662a11e2f386ff0b24fdabcb4f6c1c446a.

The following experiment seems to rule out any possible download corruption or newline differences and to indicate that there are actually two different algorithms at play:

$ echo A > foo.txt $ cat foo.txt A $ git hash-object foo.txt  f70f10e4db19068f79bc43844b49f3eece45c4e8 $ openssl sha1 foo.txt  SHA1(foo.txt)= 7d157d7c000ae27db146575c08ce30df893d3a64

What's going on?

728

asked Mar 13 '11 15:03

twcamper

2 Answers

You see a difference because git hash-object doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow:

How to assign a Git SHA1's to a file without Git?

Or, to convince yourself, try something like:

$ echo -n hello | git hash-object --stdin b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0  $ printf 'blob 5\0hello' > test.txt $ openssl sha1 test.txt SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

answered Oct 07 '22 06:10

Mark Longair

The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte.

So:

$ git hash-object foo.txt f70f10e4db19068f79bc43844b49f3eece45c4e8 $ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \                && cat foo.txt ) | openssl sha1 f70f10e4db19068f79bc43844b49f3eece45c4e8

One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is:

e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file" 4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory"

You will find that you can in fact do git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.

answered Oct 07 '22 07:10

araqnid

Related questions
                            
                                SourceTree keeps asking for Github password
                            
                                How to split a git branch into two branches?
                            
                                Git log tabular formatting
                            
                                Gem file with git remote failing on heroku push
                            
                                Connecting to GitLab repositories on Android Studio
                            
                                Cannot see new files added to my git working directory
                            
                                integrating Git Bash with Visual Studio
                            
                                Build error while transitioning between branches: Your project is not referencing the ".NETFramework,Version=v4.7.2" framework
                            
                                Cloning git repo causes error - Host key verification failed. fatal: The remote end hung up unexpectedly
                            
                                Leaving Github, how to change the origin of a Git repo?
                            
                                git rebase --continue and --stepback?
                            
                                How do I merge multiple branches into master?
                            
                                How to clone git repository from its zip
                            
                                How can I compare two revisions in git in Eclipse?
                            
                                Mirroring a HG project from Bitbucket to Github
                            
                                How to make git merge handle uncommitted changes to my working tree?
                            
                                Is there a way to make Git mark a file as conflicted?
                            
                                how to close a branch in git
                            
                                avoid rebuilding node_modules in elastic beanstalk
                            
                                Git push takes forever

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With