Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does a git SHA depend on?

Tags:

git

I was wondering what are all the parameters a git SHA depends on ? I am guessing there would be some other parameters like timestamp etc., besides content of the commit, on which construction of the SHA depends on.

I am interested in all such parameters on which this depends. I am also interested in the situation where all such parameters would be the same, or enforced to be same resulting in exactly the same git SHA of any two commits made by two people.

like image 754
pranavk Avatar asked Sep 29 '15 22:09

pranavk


People also ask

How is Git Sha generated?

Every time a commit is added to a git repository, a hash string which identifies this commit is generated. This hash is computed with the SHA-1 algorithm and is 160 bits (20 bytes) long. Expressed in hexadecimal notation, such hashes are 40 digit strings.

What does Git Sha mean?

"SHA" stands for Simple Hashing Algorithm. The checksum is the result of combining all the changes in the commit and feeding them to an algorithm that generates these 40-character strings. A checksum uniquely identifies a commit.

How many characters are used in Sha key in Git?

You can obviously refer to any single commit by its full, 40-character SHA-1 hash, but there are more human-friendly ways to refer to commits as well. This section outlines the various ways you can refer to any commit.


1 Answers

For a commit, the ID depends on checksums of at least...

  • The tree (all the files and directories) ID which is made up of...
    • The content of all the files, not the diff, called a blob.
    • The directory tree (names of files and directories and how they're organized).
    • The permissions of all the files and directories.
  • The parent commit ID(s).
  • The log message.
  • The committer name and email and date.
  • The author name and email date.

If you change just about anything about the commit the commit ID changes.

Including the parent commit IDs is very important. It means two commits with exactly the same content, but built on different parents, will still have different IDs. Why would you do that? It means if the ID of two commits are the same you know their entire history is the same. This makes it very efficient to compare and update Git repositories. "I have branch foo at commit ABC123, you do too? Great, we're in sync!"


When comparing Git to other version control systems, remember that in many popular "reliable" systems, like Subversion or CVS, anyone with the file permissions can go in and undetectably change history in the central repository. With Git such tampering will be immediately detected because it will change all the downstream commit IDs, or if they brute force matched the IDs the content would be complete nonsense.

The possibility of a SHA1 collision possibility has already been considered. Long story short, in a conflict the existing object wins.

The probability of a SHA1 collision happening accidentally is so vanishingly small, I hope your asteroid, cosmic ray, and wolf attack insurances are paid up.

If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (3.6 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

Seriously, there are better things to worry about, like the 1 in 100 chance of a drive failure. How are your backups?

like image 82
Schwern Avatar answered Sep 27 '22 21:09

Schwern