Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git Commit Generation Numbers

Tags:

What are git commit generation numbers (hacker news link) and what are their significance?

like image 249
Sirish Avatar asked Jul 15 '11 04:07

Sirish


People also ask

How are git commit ids generated?

Every time a commit is added to a git repository, a hash string which identifies this commit is generated. This hash is computed with the SHA-1 algorithm and is 160 bits (20 bytes) long. Expressed in hexadecimal notation, such hashes are 40 digit strings.

What is a git commit hash?

The commit hash is an SHA-1 hash made up of a few properties from the commit itself. As mentioned above, it is a lot more complex than this post can go into, but understanding the fundamentals is a great first step. The git hash is made up of the following: The commit message. The file changes.

Does git store time zones?

Git commits store both a Unix timestamp and a time zone offset for each commit.

What is a commit graph?

The commit-graph file is a binary file format that creates a structured representation of Git's commit history. At minimum, the commit-graph file format is faster to parse than decompressing commit files and parsing them to find their parents and root trees. This faster parsing can lead to 10x performance improvements.


1 Answers

Just to add to siri's answer, "Commit Generation Numbers" are:

  • explained here:

A commit's generation is its height in the history graph, as measured from the farthest root. It is defined as:

  • If the commit has no parents, then its generation is 0.
  • Otherwise, its generation is 1 more than the maximum of its parents generations.
  • an old topic already mentioned at the creation of Git in 2005:

Linus Torwald (yester, July 14th):
Ok, so I see that the old discussion about generation numbers has resurfaced.
And I have to say, with six years of git use, I think it's not a coincidence that the notion of generation numbers has come up several times over the years: I think the lack of them is literally the only real design mistake we have.
[...]
It actually came up as early as July 2005, so the "let's use generation numbers in commits" thing is really old.

  • about the question of quickly knowing if a commit is an ancestor of another commit (without having to walk back the DAG -- the graph of commits --):

I think it's entirely reasonable to say that the issue basically boils down to one git question: "can commit X be an ancestor of commit Y" (as a way to basically limit certain algorithms from having to walk all the way down). We've used commit dates for it, and realistically it really has worked very well. But it was always a broken heuristic.

So yes, I personally see generation counters as a way to do the commit date comparisons right. And it would be perfectly fine to just say "if there are no generation numbers, we'll use the datestamps instead, and know that they could be incorrect".

That "use the datestamps" fallback thing may well involve all the heuristics we already do (ie check for the stamps looking sane, and not trusting just one individual one).

As the Hacker news thread mentions:

Generation numbers are a result of the state of the tree, while timestamps are derived from the ambient (and potentially incorrect!) environment from which the commit was made.

At the moment, each commit stores a reference to the parent tree.
By parsing that tree and reading the entire history you can obtain a hierarchy of commits.
Because you need to order commits in many situations, reading the entire history is extremely inefficient, so git uses timestamps to determine the ordering of commits.
This of course fails if the system clock on a given machine is off.
With a generation number, you can get an ordering locally from the latest commits, without having to rely on timestamps or read the entire tree.

When you have a commit with generation n, any later commits that include it wound have generation >n, so to tell the relation between commits, you only need look as far back as n, and you can immediately get the order of any intermediate commits.
It has nothing to do with "easy to remember". It's about making git more efficient and robust

  • not redundant:

Generation numbers are completely redundant with the actual structure of history represented by the parent pointers.

Linus:

Not true. That's only true if you add "... if you parse the whole history" to that statement.
And we've never parsed the whole history, because it's just too expensive and doesn't scale. So right now we depend on commit dates with a few hacks.
So no, generation numbers are not at all redundant. They are fundamental. It's why we had this discussion six years ago.


There is still a debate as to where to cache that information (or if it should be cached), but for the user point of view, it still is about some "easy to remember" information (which isn't the goal of commit generation number):

So it's almost, but not quite, like the revision numbers everyone else has always had?

Yes -- almost, but not quite.
If you and I each create a branch off of a commit with gen #123, then, as I understand it, the subsequent commits in my branch would be #124, #125, etc., and your commits in your branch would also be #124, #125, etc.

Contrast this: - with CVS, where I would have 1.124.1.1, 1.124.1.2, etc., and you would have 1.124.2.1, 1.124.2.2, or - with Subversion, where I might get revisions 125, 128, and 129, while the server gave your commits #124, 127 and 130, and someone else, on a totally different part of the project got #126.

As long as development proceeds linearly, on a single branch, then yeah, it's about the save as revision numbers in a centralized RCS -- once you start branching and merging, though, it represents a different concept entirely.

For a single repository, it does have a very similar interpretation to, say, svn revnos.
You can speak of "revision #125 of a branch" in a specific repository. Which is generally exactly what you need for human-to-human communication about development.
"Can you see if that bug is in r125 of unstable?" "I've got all changes up to r245 of prod"
I guess the confusing aspect would be if "r245 of prod" in the central server was "r100 of prod" in my local repo because I haven't cloned the full history?

like image 141
VonC Avatar answered Sep 20 '22 20:09

VonC