Git commits not ordered by commit date

Question

We have a build process set up to create product builds ordered by commit date, but it turns out that is not always the correct order?

Two recent commits:

Commit A

Author date:    22 hours ago (7/22/2019 16:56:46)
Commit date:    22 hours ago (7/22/2019 16:57:50)

Commit B

Author date:    22 hours ago (7/22/2019 16:57:22)
Commit date:    22 hours ago (7/22/2019 16:57:44)

That is the order they appear in the repository - commit B is last, and contains changes from commit A. Yet the first commit has a date 6 seconds later than the second one. As a result, the build system assigned build numbers in the wrong order.

Does this mean the commit date is not a reliable way to order commits?

torek · Accepted Answer

There are several different points to be addressed here.

First, as you show, every commit has two date-and-time-stamps embedded in it. One is the author date and the other is the committer date. You can view both dates with git log using --pretty=fuller, for instance (there are other ways but this is simple and easy).

Next, commits can have parent / child relationships, as you mention in your second comment. More precisely:

Each commit has a unique hash ID. The hash ID is, in effect, the "true name" of the commit. git log generally prints these hash IDs, so that git log --pretty=fuller starts with commit <hash>.
Each commit also records some number of parent hash IDs. Most commits store one parent hash ID. What this means is that each child commit knows who its parent is, even though the parent commits don't know who their children are. In other words, the linkage goes only one way: from child, to parent.

The reason for the latter is that commits—in fact, all stored Git objects—are permanently frozen from the time they are created. This is because a hash ID, which is what Git uses to name and find each object, is actually just a checksum of the contents of the object as stored in the Git object database. If you take an object out of the database, fiddle with its contents in any way, and write that back, you get a new and different checksum. The original object remains unchanged. Anyone using the original hash ID gets the original object.

The way the git log command works is a little bit complicated, but it starts out pretty simple. A branch name like master or develop or release simply holds one hash ID. This hash ID locates one particular commit. That one particular commit is the tip commit of the branch. That is the definition of how branch names work in Git: whatever hash ID is stored in the branch name, that commit is the tip commit of that branch. Git changes the stored hash ID so as to change which commit is the branch-tip:

... <-F <-G <-H   <--master

Here, the uppercase letters stand in for real hash IDs. The name master holds the raw hash ID of the last commit in the branch. Git uses that to find the commit itself: the hash ID in master is the key that Git looks up in its big database of "all objects in the repository", and commit H is the result. Git reads commit H and finds that H's parent is the hash ID of commit G, so now git log can fish commit G out of the repository. Commit G has its usual information—including the two date-and-time stamps, and as G's parent, the hash ID of commit F.

To add a new commit to the master branch, Git writes out a new commit—which gets some random-looking hash ID, but we'll just call it I—that has its two date-and-time stamps, and that has the hash ID of commit H as its parent. Git gets the hash ID for H from the name master. The writing-out of the new commit is what assigns it its unique hash ID. Now that commit I exists in the repository, Git simply overwrites master with the new hash ID:

... <-F <-G <-H <-I   <--master

Hence, what git log does—at least for simple cases like this—is:

Extract the hash ID for the current branch.
Show that commit (with its date-and-time stamps).
Follow that commit to its parent. Show that commit.
Follow that commit to its parent. Show that commit.
Repeat until you run out of commits.

The result is that the output of git log is in graph order, starting with the tip of the branch and working backwards. The date-and-time-stamps stored in these commits don't matter. There are more complicated cases for git log where they do matter, but let's start with this. Fundamentally, git log works backwards, commit-by-commit, through the graph formed by the links that connect a child commit to its parent.

Who puts the date-and-time stamps into the commits?

By default, git commit creates a new commit with both timestamps set to "now". But "now" is determined by your computer's clock. If your computer's clock is wrong, the timestamps will be wrong.

You can override the author timestamp pretty easily: many Git commands, including git commit itself, take a flag, such as --date=date, to set whatever author timestamp you want. Overriding the committer timestamp is a bit harder as there is no flag, but not actually hard because git commit reads environment variables. The environment variable GIT_COMMITTER_DATE can be set to the same kinds of string values that the --date option accepts; setting this forces the committer timestamp to whatever value you like (within the range of dates Git can represent, that is).

When does `git log` sort by date?

There are two ways that git log gets into situations in which it "wants" to show more than one commit at a time. One is when you tell git log which commits to show:

git log master develop

says, for instance, to show the tip commit of branch master and the tip commit of branch develop:

             I--J   <-- develop
            /
...--F--G--H   <-- master

Which one should git log show first? Ideally, it might be J, since J goes back to I and then to H. In practice, git log chooses whichever commit has the greater committer timestamp, unless you set various other git log options to override it. In most cases, that is commit J and everything works nicely.

Another case occurs when your commit graph has merge commits in it. A merge commit is simply any commit with two or more parents. ("More" is rare, and not really special, nor especially useful; it suffices to consider the case of two parents.) That is, suppose we have, at some point, this graph:

       I--J   <-- master
      /
...--H
      \
       K--L   <-- feature

While things are like this, in this repository, we git checkout master and then git merge feature. If all goes well, the result is:

       I--J
      /    \
...--H      M   <-- master
      \    /
       K--L   <-- feature

Commit M here is a merge commit, which means it has more than one parent. Its two parents are commit J—the old tip of master, before Git overwrote the name master with hash ID M—and commit L, which is still the tip of feature. We can now safely delete the name feature because we can find commit L by starting at commit M and working backwards.

If we do delete the name feature, and maybe add another commit to master, we end up with:

       I--J
      /    \
...--H      M--N   <-- master
      \    /
       K--L

Now git log will start by showing us commit N. Then it will move to N's parent M, and show us M. Then it will ... well, now what?

The trick git log uses here—and with our git log master develop example too, actually—is that git log actually uses a priority queue. This priority queue is, initially, totally empty. You run:

git log ...

and give it a list of starting points. If you don't give it any, git log picks the current branch's tip commit as the (single) starting point. Git turns the branch name(s), if any, into their hash IDs, and stuffs all the hash IDs into this priority queue.

Now that the queue has some entries, Git takes the highest priority commit out of the queue. By default, that's the one with the highest committer timestamp. But if the queue only has one commit in it, Git just takes the one commit. There's nothing to compare: there's just the one commit in the queue! Note that this takes the only commit that's in the queue, out of the queue, so that now the queue is empty.

That's the commit that git log will show now.¹ Git fetches the actual commit out of the all-objects database and shows it. Fetching the commit also gives git log the hash IDs of the parent commit(s). Git puts those into the queue.² If there's just the one parent, and the queue got emptied by pulling out its one entry, now there's just one entry in the queue again.

So, for a simple chain that starts at some branch tip and works backwards without any merges, this queue algorithm just shows each commit, one at a time, in the backwards order that Git uses. The last commit comes out first, and the first one comes out last.

But when Git hits a merge commit like M, Git puts both parents into the queue. Now the queue has two entries, so now its priority sorting takes effect. Again, the default priority is that newer commits—those with higher committer timestamps—go to the front of the queue. Git will show the later (by committer date) commit next, and put its parent(s) into the queue. If this commit's parents have higher priority than any other commits in the queue, those parents get shown next.

In other words, the actual git log loop is not just show commit, show parent, show parent, ... but rather:

Put all command-line commits into the queue.
While the queue is not empty:
- Pick the next commit from the queue.
- Show it (or not, with details depending on git log options).
- Put its parent(s) into the queue (with details depending on git log options).

The queue itself determines the order in which commits get shown, but the parent linkage determines the ability for commits to get into the queue. You prime the queue with your command-line git log command. After that, the parent linkages and queue mechanics take over.

¹Depending on options to git log, maybe it won't actually show the commit. Let's leave that for other questions, though.

²Again, there can be complications here, but let's stick with the simple model. Just remember that git log can do what Git calls History Simplification, which can omit some legs of branches, as well as not bother to show some commits. Also, Git won't show the same commit twice in one git log, so if this would put an already-shown commit into the queue, it doesn't put it in now.

Conclusion

While git log does default to using committer timestamps to sort commits, that happens only if/when there are multiple entries in the "to be shown" queue. Each commit's parent hash IDs get put into the queue at the time the commit gets shown. So for simple linear chains, git log just walks those chains in their internal, backwards order. It's mainly at merges where the sort order becomes important (or, of course, if you use more than one branch name, or use --all to look at all branches).

The date-and-time-stamps can be wrong (because the computer's clock was wrong) or spoofed (on purpose, for good or bad reasons). So even if there's nothing malicious going on, you cannot rely on these.

The --graph option to git log, which draws crude ASCII representations of the parent/child relationships, also forces --topo-order, which changes the priorities in the priority queue. Under --topo-order, a parent commit only gets shown after all of its children that will get shown, have been shown.

Git commits not ordered by commit date

Tags:

git

riv

1 Answers

Who puts the date-and-time stamps into the commits?

When does `git log` sort by date?

Conclusion

torek

Recent Activity

Donate For Us

Git commits not ordered by commit date

Tags:

git

riv

1 Answers

Who puts the date-and-time stamps into the commits?

When does git log sort by date?

Conclusion

torek

Related questions

Recent Activity

Donate For Us

When does `git log` sort by date?