I was wondering what are all the parameters a git SHA depends on ? I am guessing there would be some other parameters like timestamp etc., besides content of the commit, on which construction of the SHA depends on.
I am interested in all such parameters on which this depends. I am also interested in the situation where all such parameters would be the same, or enforced to be same resulting in exactly the same git SHA of any two commits made by two people.
Every time a commit is added to a git repository, a hash string which identifies this commit is generated. This hash is computed with the SHA-1 algorithm and is 160 bits (20 bytes) long. Expressed in hexadecimal notation, such hashes are 40 digit strings.
"SHA" stands for Simple Hashing Algorithm. The checksum is the result of combining all the changes in the commit and feeding them to an algorithm that generates these 40-character strings. A checksum uniquely identifies a commit.
You can obviously refer to any single commit by its full, 40-character SHA-1 hash, but there are more human-friendly ways to refer to commits as well. This section outlines the various ways you can refer to any commit.
For a commit, the ID depends on checksums of at least...
If you change just about anything about the commit the commit ID changes.
Including the parent commit IDs is very important. It means two commits with exactly the same content, but built on different parents, will still have different IDs. Why would you do that? It means if the ID of two commits are the same you know their entire history is the same. This makes it very efficient to compare and update Git repositories. "I have branch foo
at commit ABC123, you do too? Great, we're in sync!"
When comparing Git to other version control systems, remember that in many popular "reliable" systems, like Subversion or CVS, anyone with the file permissions can go in and undetectably change history in the central repository. With Git such tampering will be immediately detected because it will change all the downstream commit IDs, or if they brute force matched the IDs the content would be complete nonsense.
The possibility of a SHA1 collision possibility has already been considered. Long story short, in a conflict the existing object wins.
The probability of a SHA1 collision happening accidentally is so vanishingly small, I hope your asteroid, cosmic ray, and wolf attack insurances are paid up.
If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (3.6 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.
Seriously, there are better things to worry about, like the 1 in 100 chance of a drive failure. How are your backups?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With