Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does ancestry path work with git log?

Tags:

I've read the git log documentation, but I still find it very difficult to understand what the --ancestry-path option does. I see different ways to invoke git log:

$ git log origin/master..HEAD $ git log --ancestry-path origin/master..HEAD 

In the first command, I get a list of commits that are on HEAD but not on origin/master, basically this shows me what is on my branch that isn't merged.

In the second command, I get nothing. If I change to 3 dots (...) it shows me something, but I'm not sure how to make sense of it. Basically, how is the addition of --ancestry-path any different? What exactly does it simplify?

like image 216
void.pointer Avatar asked Apr 05 '16 17:04

void.pointer


People also ask

How does git find common ancestor?

git merge-base finds best common ancestor(s) between two commits to use in a three-way merge. One common ancestor is better than another common ancestor if the latter is an ancestor of the former. A common ancestor that does not have any better common ancestor is a best common ancestor, i.e. a merge base.

What is indirect ancestor in git?

We define ancestors here in terms of the commit DAG: a first commit is a direct ancestor of a second if the second has an arrow pointing back at the first, and an indirect ancestor if the second points back at the first through some chain of commits.

What does the command git log Oneline graph do?

The --graph flag enables you to view your git log as a graph. To make things things interesting, you can combine this command with --oneline option you learned from above. One of the benefit of using this command is that it enables you to get a overview of how commits have merged and how the git history was created.

Does git log show all branches?

Graph all git branchesDevelopers can see all branches in the graph with the –all switch. Also, in most situations, the –decorate switch will provide all the supplemental information in a formatted and nicely color-coded way.


2 Answers

Matthieu Moy's answer is correct but may not help you very much, if you haven't been exposed to the necessary graph theory.

DAGs

First, let's take a quick look at Directed Acyclic Graphs or DAGs. A DAG is just a graph (hence the g), i.e., a collection of nodes and connections between them—these work like train stations on rail lines, for instance, where the stations are the nodes—that is "directed" (the d: trains only run one way) and have no loops in them (the a).

Linear chains and tree structures are valid DAGs (note: newer commits are to the right, in general, here):

o <- o <- o 

or:

       o <- o       / o <- o       \   o        \ /         o          \           o <- o 

(imagine the diagonal connections having arrow heads so that they point up-and-left or down-and-left, as needed).

However, non-tree graphs can have nodes that merge back (these are git's merges):

       o <- o       /      \ o <- o        \       \   o    \        \ /      \         o        o          \      /           o <- o 

or:

     o--o     /    \ o--o      o--o     \    /      o--o 

(I'm just compressing the notation further here, nodes still generally point leftward).

Next, git's .. notation does not mean what most people usually first think it means. In particular, let's take a look at this graph again, add another node, and use some single letters to mark particular nodes:

     o---o     /     \ A--o       \     \   B   \      \ /     \       o       C--D        \     /         o---o 

And, let's do one more thing, and stop thinking about this as just git log but rather the more general case of "selecting revisions with ancestry".

Selecting revisions (commits), with ancestry

If we select revision A, we get just revision A, because it has no ancestors (nothing to the left of it).

If we select revision B we get this piece of the graph:

A--o     \   B      \ /       o 

This is because select-with-ancestry means "Take the commit I identify, and all the commits I can get to by following the arrows back out of it." Here the result is somewhat interesting, but not very interesting since there are no merges and following the arrows nets us a linear chain of four commits, starting from B and going back to A.

Selecting either C or D with ancestry, though, gets us much further. Let's see what we get with D:

     o---o     /     \ A--o       \     \       \      \       \       o       C--D        \     /         o---o 

This is, in fact, everything except commit B. Why didn't we get B? Because the arrows all point leftward: we get D, which points to C, which points to two un-lettered commits; those two point left, and so on, but when we hit the node just left-and-down of B, we aren't allowed to go rightward, against the arrow, so we can't reach B.

Two-dot notation

Now, the two-dot notation in git is really just shorthand syntax for set subtraction.1 That is, if we write B..D for instance, it means: "Select D with ancestry, and then select B with ancestry, and then give me the set of commits from the D selection after excluding (subtracting away) all commits from the B selection."

Selecting D with ancestry gets the entire graph except for the B commit. Subtracting away the B selection removes A, the two o nodes we drew earlier, and B. How can we remove B when it's not in the set? Easy: we just pretend to remove it and say we're done! That is, set subtraction only bothers to remove things that are actually in the set.

The result for B..D is therefore this graph:

     o---o           \            \             \              \               C--D              /         o---o 

Three-dot notation

The three-dot notation is different. It's more useful in a simple branch-y graph, perhaps even a straight tree. Let's start with the tree-like graph this time and look at both two- and three-dot notation. Here's our tree-like graph, with some single letter names for nodes put in:

     o--I     / G--H     \   J      \ /       K        \         o--L 

This time I've added extra letters because we'll need to talk about some of the places the commits "join up", in particular at nodes H and K.

Using two-dot notation, what do we get for L..I? To find the answer, start at node I and work backwards. You must always move leftward, even if you also go up or down. These are the commits that are selected. Then, start at node L and work backwards, finding the nodes to un-select; if you come across any earlier selected ones, toss them out. (Making the final list is left as an exercise, though I'll put the answer in as a footnote.2)

Now let's see the three-dot notation in action. What it does is a bit complicated, because it must find the merge base between two branches in the graph. The merge base has a formal definition,3 but for our purposes it's just: "The point where, when following the graph backwards, we meet up at some commit."

In this case, for instance, if we ask for L...I or I...L—both produce the same result—git finds all commits that are reachable from either commit, but not from both. That is, it excludes the merge base and all earlier commits, but keeps the commits beyond that point.

The merge base of L and I (or I and L) is commit H, so we get things after H, but not H itself, and we cannot reach node J from either I or L since it's not in their ancestry. Hence, the result for I...L or L...I is:

     o--I            K        \         o--L 

(Note that these histories do not join up, since we tossed out node H.)

--ancestry-path

Now, all these are ordinary selection operations. None have been modified with --ancestry-path. The documentation for git log and git rev-list—these two are almost the same command, except for their output format—describes --ancestry-path this way:

When given a range of commits to display (e.g. commit1..commit2 or commit2 ^commit1), only display commits that exist directly on the ancestry chain between the commit1 and commit2, i.e. commits that are both descendants of commit1, and ancestors of commit2.

We define ancestors here in terms of the commit DAG: a first commit is a direct ancestor of a second if the second has an arrow pointing back at the first, and an indirect ancestor if the second points back at the first through some chain of commits. (For selection purposes a commit is also considered an ancestor of itself.)

Descendants (also sometimes called children) are defined similarly, but by going against the arrows in the graph. A commit is a child (or descendant) of another commit if there's a path between them.

Note that the description of the --ancestry-path talks about using the two-dot notation, not the three-dot notation, probably because the implementation of the three-dot notation is a little bit weird inside. As noted earlier, B...D excludes (as if with leading ^) the merge base (or bases, if there is/are more than one) of the two commits, so the merge base is the one that play the "must be child-of" role. I'll mention how --ancestry-path works with this, though I'm not sure how useful it is in "real world" examples.

Practical examples

What does this mean in practice? Well, it depends on the arguments you give, and the actual commit DAG. Let's look at the funky loopy graph again:

     o---o     /     \ A--o       \     \   B   \      \ /     \       o       C--D        \     /         o---o 

Suppose we ask for B..D here without --ancestry-path. This means we take commit D and its ancestors, but exclude B and its ancestors, just as we saw before. Now let's add --ancestry-path. Everything we had earlier was an ancestor of D, and that's still true, but this new flag says we must also toss out commits that are not children of B.

How many children does node B have? Well, none! So we must toss out every commit, giving us a completely empty list.


What if we ask for B...D, without the special --ancestry-path notation? That gives us everything reachable from either D or B, but excludes everything reachable from both D and B:

     o---o           \            \         B   \              \               C--D              /         o---o 

This is the same as B..D except that we get node B as well.

[Note: the section below on mixing --ancestry-path with B...D was wrong for almost a year, between April 2016 and Feb 2017. It has been fixed to note that the "must be child" part starts from the merge base(s), not from the left side of the B...D notation.]

Suppose we add --ancestry-path here. We start with the same graph we just got for B...D without --ancestry-path, but then discard items that are not children of the merge base. The merge base is the o just to the left of B. The top row o commits are not children of this node, so they are discarded. Again, as with ancestors, we consider a node its own child, so we would keep this node itself—giving this partial result:

        B        /       o       C--D        \     /         o---o 

But, while we are (or --ancestry-path is) discarding children of this merge base node, the merge base node itself, to the down-and-left of B, was not in the B...D graph in the first place. Hence, the final result (actually tested in Git 2.10.1) is:

        B                C--D              /         o---o 

(Again, I'm not really sure how useful this is in practice. The starting graph, again, is that of B...D: everything reachable from either commit, minus everything reachable from both commits: this works by discarding starting from every merge base, if there are two or more. The child-of checking code also handles a list of commits. It retains everything that is a child of any of the merge bases, if there are multiple merge bases. See the function limit_to_ancestry in revision.c.)

Thus, it depends on the graph and the selectors

The final action of X..Y or X...Y, with or without --ancestry-path, depends on the commit graph. To predict it, you must draw the graph. (Use git log --graph, perhaps with --oneline --decorate --all, or use a viewer that draws the graph for you.)


1There's an exception in git diff, which does its own special handling for X..Y and X...Y. When you are not using git diff you should just ignore its special handling.

2We start with I and the o to its left, and also H and G. Then we lose H and G when we work back from L, so the result is just o--I.

3The formal definition is that the merge base is the Lowest Common Ancestor, or LCA, of the given nodes in the graph. In some graphs there may be multiple LCAs; for Git, these are all merge bases, and X...Y will exclude all of them.

It's interesting / instructive to run git rev-parse B...D for the graph I drew. These commit hashes here depend on not just the graph itself, and the commit, but also the time stamps at which one makes the commits, so if you build this same graph, you will get different hashes, but here are the ones I got while revising the answer to fix the description of --ancestry-path interacting with B...D:

$ git rev-parse B...D 3f0490d4996aecc6a17419f9cf5a4ab420c34cc2 7f0b666b4098282301a9f95e056a646483c2e5fc ^843eaf75d78520f9a569da35d4e561a036a7f107 

but we can see that these are D, B, and the merge base, in that order, using several more commands:

$ git rev-parse B     # this produces the middle hash 7f0b666b4098282301a9f95e056a646483c2e5fc 

and:

$ git rev-parse D     # this produces the first hash 3f0490d4996aecc6a17419f9cf5a4ab420c34cc2 

and:

$ git merge-base B D  # this produces the last, negated, hash 843eaf75d78520f9a569da35d4e561a036a7f107 

Graphs with multiple merge bases do occur, but they're somewhat harder to construct—the easy way is with "criss cross" merges, where you run git checkout br1; git merge br2; git checkout br2; git merge br1. If you get this situation and run git rev-list you will see several negated hashes, one per merge base. Run git merge-base --all and you will see the same set of merge bases.

like image 107
torek Avatar answered Sep 23 '22 23:09

torek


As the documentation says, --ancestry-path removes commits that are not descendant of origin/master. If you have a local, unmerged branch, and this branch is based on a commit which is older than origin/master, then commits in this branch will not be shown because these commits are not descendant of origin/master.

like image 22
Matthieu Moy Avatar answered Sep 22 '22 23:09

Matthieu Moy