Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where do commit ids come from?

Tags:

git

I'm just really curious about this. Commit ids can't be randomised, since they need to be unique. They appear to be random, though, and it got me wondering, why aren't they just consecutive numbers? I mean, they only need to be unique within the repository, right? Or am I in the wrong here?

Thanks!

like image 641
Dunno Avatar asked Jul 25 '14 14:07

Dunno


2 Answers

Git Commit IDs are SHA-1 Hashes

In a distributed version control system like Git, revision numbers must be consistent across all systems. Because Git history is a directed acyclic graph rather than a linear series, commits and objects use a SHA-1 hash for unambiguous identification across systems.

Commit IDs aren't random in Git. They are in fact SHA-1 hashes of the commit object, which includes a manifest of trees and objects. See Git Internals for additional details. The end result is that any given object hash is deterministic: the same object will result in the same hash regardless of how it arrived in the current state.

like image 115
Todd A. Jacobs Avatar answered Oct 19 '22 09:10

Todd A. Jacobs


Since Git is distributed there is never one "ground truth" repository that can decide what commit will have what id. Also, repositories cannot communicate what ids are taken or not. Hence, every single Git installation should make sure to minimise the risk of having a collision (two commits having the same id).

To achieve that, Git is using a hashing algorithm called SHA1 to calculate the commit ids. Each commit id consists of 160 bit of data, meaning you can have 2^160 possible combinations (approx. a 1 with 50 zeroes).

Using a hash function does not guarantee uniqueness but minimises the probability of a collision since hashing algorithms are specifically designed to do ensure that.

SVN on the other hand has a central repository and can therefore use consecutive integer numbers.

Git itself doesn't have a method of handling collisions: If you pull a set of commits with one or more collisions, Git will simply ignore the colliding commit; leaving the original one in place.

Also: Using a hashing algorithm doesn't only solve the problem of commit ids but is also a security measure: Since all data for a commit (diff, author, date and SHA1 of the previous commit) are used to calculate the hash it is impossible to change patches afterwards without having to change every single hash since then

like image 22
Nils Werner Avatar answered Oct 19 '22 11:10

Nils Werner