Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does git actually verify the integrity of the commit chain?

Tags:

I have read multiple times about the integrity mechanism in git based on SHA-1 hashes and links to parent commits, which ensures that no changes are made to the committed data in the git repository.

My question is: During which operations does git check that the hashes are valid, i. e. match the content of the commits? Is a check performed during a push or maybe a pull? Unfortunately, I haven't found any information on this.

like image 785
Jeff S. Avatar asked Jun 01 '18 20:06

Jeff S.


People also ask

How does Git ensure data integrity?

Git Has IntegrityEverything in Git is checksummed before it is stored and is then referred to by that checksum. This means it's impossible to change the contents of any file or directory without Git knowing about it.

How does Git determine commit hash?

The commit hash by hashing the data you see with cat-file . This includes the tree object hash and commit information like author, time, commit message, and the parent commit hash if it's not the first commit.

What is checksum in git?

Internally, Git runs SHA checksum on every file. Every commit is just a list of file names and their corresponding checksums. The database mapping checksum to its content is shared among commits. If the content of a file in two commits are the same, only the checksum needs to be stored in the two commits.

Why does git use SHA-1?

GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits. It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one.


1 Answers

Obsidian's comment is spot-on: the name of each Git object is the hash ID of the object's content, so anything that uses the ID to look up and read the content can, and usually does, verify that the hash of the extracted data matches the ID used as a key to extract that data.

Additional checking—verifying that the GPG signature in a tag or commit—is only done when you specifically request it. You can request that git log check such signatures by default, using the log.showSignature configuration setting.

Note that the integrity of any node in a Merkle tree depends on whether you trust prior nodes against second-preimage attacks. If you use GPG-signed tags, the signatures in those tags protect each tag's data (to whatever degree you trust GPG itself), and then the tag protects its commit object (to whatever degree you trust SHA-1). The commit object in turn protects its tree, which protects its subtrees and blobs, and the blob hashes protect their contents. So you should do a different kind of analysis if you're concerned with second-preimage attacks. If you're just concerned with random data corruption (as seen on spinning media and/or non-ECC memory), you can just use the SHA-1 hash directly the way Git does.

like image 84
torek Avatar answered Oct 04 '22 20:10

torek