Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does a Git branch start and what is its length?

People also ask

How long should branches exist Git?

One key rule is the length of life of the branch before it gets merged and deleted. Simply put, the branch should only last a couple of days. Any longer than two days, and there is a risk of the branch becoming a long-lived feature branch (the antithesis of trunk-based development).

What exactly is a branch in Git?

A branch represents an independent line of development. Branches serve as an abstraction for the edit/stage/commit process. You can think of them as a way to request a brand new working directory, staging area, and project history.

What are the three types of branching in Git?

The two primary branches in Git flow are main and develop. There are three types of supporting branches with different intended purposes: feature, release, and hotfix.


In Git, you could say that every branch starts at the root commit, and that would be quite literally true. But I guess that's not very helpful for you. What you could do instead is to define "the start of a branch" in relation to other branches. One way you can do this is to use

git show-branch branch1 branch2 ... branchN

and that will show you the common commit between all specified branches at the bottom of the output (if there is, in fact, a common commit).

Here's an example from the Linux Kernel Git documentation for show-branch

$ git show-branch master fixes mhf
* [master] Add 'git show-branch'.
 ! [fixes] Introduce "reset type" flag to "git reset"
  ! [mhf] Allow "+remote:local" refspec to cause --force when fetching.
---
  + [mhf] Allow "+remote:local" refspec to cause --force when fetching.
  + [mhf~1] Use git-octopus when pulling more than one heads.
 +  [fixes] Introduce "reset type" flag to "git reset"
  + [mhf~2] "git fetch --force".
  + [mhf~3] Use .git/remote/origin, not .git/branches/origin.
  + [mhf~4] Make "git pull" and "git fetch" default to origin
  + [mhf~5] Infamous 'octopus merge'
  + [mhf~6] Retire git-parse-remote.
  + [mhf~7] Multi-head fetch.
  + [mhf~8] Start adding the $GIT_DIR/remotes/ support.
*++ [master] Add 'git show-branch'.

In that example, master is being compared with the fixes and mhf branches. Think of this output as a table, with each branch represented by its own column, and each commit getting its own row. Branches that contain a commit will have a + or - show up in their column in the row for that commit.

At the very bottom of the output, you'll see that all 3 branches share a common ancestor commit, and that it is in fact the head commit of master:

*++ [master] Add 'git show-branch'.

This means that both fixes and mhf were branched off of that commit in master.

Alternative solutions

Of course that's only 1 possible way to determine a common base commit in Git. Other ways include git merge-base to find common ancestors, and git log --all --decorate --graph --oneline or gitk --all to visualize the branches and see where they diverge (though if there are a lot of commits that becomes difficult very quickly).

Other questions from original poster

As for these questions you had:

Is commit D a member of both branches or can we clearly decide whether it belongs to branch-A or branch-B?

D is a member of both branches, it's an ancestor commit for both of them.

Supervisors sometimes like to know, when a branch has been started (it usually marks the start of a task)...

In Git, you can rewrite the history of the entire commit tree(s) and their branches, so when a branch "starts" is not as set in stone as in something like TFS or SVN. You can rebase branches onto any point in time in a Git tree, even putting it before the root commit! Therefore, you can use it to "start" a task at any point in time in the tree that you want.

This is a common use case for git rebase, to sync branches up with the latest changes from an upstream branch, to push them "forward" in time along the commit graph, as if you had "just started" working on the branch, even though you've actually been working on it for a while. You could even push branches back in time along the commit graph, if you wanted to (though you might have to resolve a lot of conflicts, depending on the branch contents...or maybe you won't). You could even insert or delete a branch from right in the middle of your development history (though doing so would probably change the commit shas of a lot of commits). Rewriting history is one of the primary features of Git that makes it so powerful and flexible.

This is why commits come with both an authored date (when the commit was originally authored), and a committed date (when the commit was last committed to the commit tree). You can think of them as analogous to create time-date and last-modified time-date.

Supervisors sometimes like to know...to which branch some changes belong to (to get the purpose of some change - was it required for the work).

Again, because Git allows you to rewrite history, you can (re)base a set of changes on pretty much any branch/commit in the commit graph that you want. git rebase literally allows you to move your entire branch around freely (though you might need to resolve conflicts as you go, depending on where you move the branch to and what it contains).

That being said, one of the tools you can use in Git to determine which branches or tags contains a set of changes is the --contains:

# Which branches contains commit X?
git branch --all --contains X

# Which tags contains commit X?
git tag --contains X

The bounty notice on this question asks,

I'd be interested in knowing whether or not thinking about Git branches as having a defined "beginning" commit other than the root commit even makes sense?

It kind of does except:

  • the root commit is "the first commit accessible from the branch HEAD" (and don't forget there can be multiple root commit with orphan branches, used for instance in GitHub for gh-pages)
  • I prefer considering the start of a branch being the commit of another branch from which said branch has been created (tobib's answer without the ~1), or (simpler) the common ancestor.
    (also in "Finding a branch point with Git?", even though the OP mentioned being not interested in common ancestors):

    git merge-base A master
    

That means:

  • the first definition gives you a fixed commit (which might never change except in case of massive filter-branch)
  • the second definition gives you a relative commit (relative to another branch) which can change at any time (the other branch can be deleted)

The second one makes more sense for git, which is all about merge and rebase between branches.

Supervisors sometimes like to know, when a branch has been started (it usually marks the start of a task) and to which branch some changes belong to (to get the purpose of some change - was it required for the work)

Branches are simply the wrong marker for that: due to the transient nature of branches (which can be renamed/moved/rebased/deleted/...), you cannot mimick a "change set" or an "activity" with a branch, to represent a "task".

That is an X Y problem: the OP is asking for an attempted solution (where does a branch starts) rather than the actual problem (what could be considered a task in Git).

To do that (representing a task), you could use:

  • tags: they are immutable (once associated to a commit, that commit is no longer supposed to move/be rebased), and any commits between two well-named tags can represent an activity.
  • some git notes to a commit to memorize to which "work item" said commit has been created (contrary to tags, notes can be rewritten if the commit is amended or rebased).
  • hooks (to associate a commit to some "external" object like a "work item", based on the commit message). That is what the bridge Git-RTC -- IBM Rational Team Concert -- does with a pre-receive hook) The point is: the start of a branch does not always reflect the start of a task, but merely the continuation of an history which can change, and whom sequence should represent a logical set of changes.

Perhaps you are asking the wrong question. IMO, it doesn't make sense to ask where a branch starts since a given branch includes all changes made to every file ever (i.e. since the initial commit).

On the other hand, asking where two branches diverged is definitely a valid question. In fact, this seems to be exactly what you want to know. In other words, you don't really want to know information about a single branch. Instead you want to know some information about comparing two branches.

A little bit of research turned up the gitrevisions man page which describes the details of referring to specific commits and ranges of commits. In particular,

To exclude commits reachable from a commit, a prefix ^ notation is used. E.g. ^r1 r2 means commits reachable from r2 but exclude the ones reachable from r1.

This set operation appears so often that there is a shorthand for it. When you have two commits r1 and r2 (named according to the syntax explained in SPECIFYING REVISIONS above), you can ask for commits that are reachable from r2 excluding those that are reachable from r1 by ^r1 r2 and it can be written as r1..r2.

So, using the example from your question, you can get the commits where branch-A diverges from master with

git log master..branch-A

There are two separate concerns here. Starting from your example,

A - B - C - - - - J     [master]
     \
      \       F - G     [branch-A]
       \     /
        D - E
             \
              H - I     [branch-B]

[...] Supervisors sometimes like to know, when a branch has been started (it usually marks the start of a task) and to which branch some changes belong (to get the purpose of some change - was it required for the work)

two factual observations before we get to the meat:

First observation: what your supervisor wants to know is the mapping between commits and some external workorder-ish record: what commits address bug-43289 or featureB? Why are we changing strcat usage in longmsg.c? Who's going to pay for the twenty hours between your previous push and this one? The branch names themselves don't matter here, what matters is the commits' relationships with external administrative records.

Second observation: whether branch-A or branch-B gets published first (via say merge or rebase or push), the work in commits D and E has to go with it right then and not be duplicated by any subsequent operation. It makes no difference at all what was current when those commits were made. Branch names don't matter here, either. What matters is commits' relations with each other via the ancestry graph.


So my answer is, so far as any history is concerned branch names don't matter at all. They're convenience tags showing which commit is current for some purpose specific to that repo, nothing more. If you want some useful moniker in the default merge-commit message subject line, git branch some-useful-name the tip before merging, and merge that. They're the same commits either way.

Tying whatever branch name the developer had checked out at the time of commit with some external record -- or anything at all -- is deep into "all fine so long as everything works" territory. Don't Do It. Even with the restricted usage common in most VCS's, your D-E-{F-G,H-I} will occur sooner rather than later, and then your branch naming conventions will have to be adapted to handle that, and then something more complicated will show up, . . .

Why bother? Put the report number(s) prompting the work in a tagline at the bottom of your commit messages and be done with it. git log --grep (and git in general) is blazingly fast for good reason.

Even a fairly flexible prep hook to insert taglines like this is trivial:

branch=`git symbolic-ref -q --short HEAD`                     # branch name if any
workorder=`git config branch.${branch:+$branch.}x-workorder`  # specific or default config
tagline="Acme-workorder-id: ${workorder:-***no workorder supplied***}"
sed -i "/^ *Acme-workorder-id:/d; \$a$tagline" "$1"

and here's the basic pre-receive hook loop for when you need to inspect every commit:

while read old new ref; do              # for each pushed ref
        while read commit junk; do      # check for bad commits

                # test here, e.g. 
                git show -s $commit | grep -q '^ *Acme-workorder-id: ' \
                || { rc=1; echo commit $commit has no workorder associated; }
                # end of this test

        done <<EOD
        $(git rev-list $old..$new)
EOD
done
exit $rc

The kernel project uses taglines like this for copyright signoff and code-review recording. It really couldn't get much simpler or more robust.

Note that I did some hand-mangling after c&p to de-specialize real scripts. Keyboard-to-editbox warning


I think this is probably a good opportunity for education. git doesn't really record the starting point of any branch. Unless the reflog for that branch still contains the creation record, there's no way to definitively determine where it started, and if the branch has merges in it anywhere, it may in fact have more than one root commit, as well as many different possible points where it might have been created and started to diverge from its original source.

It might be a good idea to ask a counter question in such cases - why do you need to know where it branched from, or does it matter in any useful way where it branched from? There might or might not be good reasons that this is important - many of the reasons are probably tied up in the specific workflow your team has adopted and is trying to enforce, and may indicate areas where your workflow might be improved in some way. Perhaps one improvement would be figuring out what the "right" questions to ask - for example, rather than "where did branch-B branch from", maybe "what branches do or don't contain the fixes/new features introduced by branch-B"...

I'm not sure that a completely satisfactory answer to this question really exists...


From a philosophical point of view, the question of a branch's history cannot be answered in a global sense. However, the reflog does track each branch's history in that particular repository.

Thus, if you have a single central repository that everyone pushes to, you can use its reflog to track this information (some more details in this question). First, on that central repository, ensure the reflog is recorded and never gets cleaned out:

$ git config core.logAllRefUpdates true
$ git config gc.reflogExpire never

Then you can run git reflog <branchname> to inspect the branch's history.

Example

I reproduced your sample commit graph with a few pushes into a test repository. Now I can do this sort of thing:

$ git log --graph --all --oneline --decorate
* 64c393b (branch-b) commit I
* feebd2f commit H
| * 3b9dbb6 (branch-a) commit G
| * 18835df commit F
|/  
* d3840ca commit E
* b25fd0b commit D
| * 8648b54 (master) commit J
| * 676e263 commit C
|/  
* 17af0d2 commit B
* bdbfd6a commit A

$ git reflog --date=local master branch-a branch-b
64c393b branch-b@{Sun Oct 11 21:45:03 2015}: push
3b9dbb6 branch-a@{Sun Oct 11 21:45:17 2015}: push
18835df branch-a@{Sun Oct 11 21:43:32 2015}: push
8648b54 master@{Sun Oct 11 21:42:09 2015}: push
17af0d2 master@{Sun Oct 11 21:41:29 2015}: push
bdbfd6a master@{Sun Oct 11 21:40:58 2015}: push

So you can see that in my example, when branch-a first came into existence, it was pointed at commit F, and that a second push to the central server moved it forward to commit G. Whereas when branch-b first came into existence, it was pointed at commit I, and it hasn't seen any updates yet.

Caveats

This shows only the history as it was pushed to the central repo. If for example a coworker started branch-A at commit A, but then rebased it onto commit B before pushing it, that information would not be reflected in the central repository's reflog.

This also doesn't provide a definitive record of where a branch began. We can't really say for sure which branch "owns" commits D and E that initially forked off of master. Did they get created on branch-a and then picked up by branch-b, or the other way around?

Both branches initially appeared on the central repository containing those commits, and the reflog does tell us which branch showed up on the central repository first. However, these commits may have been "passed around" amongst several end-user repositories, via format-patch, etc. So even though we know which branch pointer was responsible for carrying them to the central server first, we don't know their ultimate origin.


Some architectural details

Git stores the revisions to a repository as a series of commits. These commits contain a link to information about the changes to the files since the last commit and, importantly, a link to the previous commit. In broad terms a branch's commit history is a singly linked list from the most recent revision all the way back to the repository's root. The state of the repository at any commit is that commit combined with all the commits prior to it all the way back to the root.

So what's the HEAD? And what's a branch?

The HEAD is a special pointer to the newest commit in the currently active branch. Each branch, including the master1, is also a pointer to the latest revision in it's history.

Clear as mud? Let's look at an example using an image from the Pro Git book, that will hopefully clarify things somewhat.2

Simple Git Tree

In this diagram we have a relatively simple repository with 4 commits. 98ca9 is the root. There are two branches, master and testing. The master branch is at commit f30ab while the testing branch is at 87ab2. We are currently working in the master branch, so the HEAD is pointing to the master branch. The history for the branches in our sample repository are (from newest to oldest):

testing:  87ab2 -> f30ab -> 34ac2 -> 98ca9
 master:           f30ab -> 34ac2 -> 98ca9

From this we can see that the two branches are the same starting at f30ab, so we can also say that testing was branches at that commit.

The Pro Git book goes into a lot more detail, and it's definitely worth a read.

Now we can address--

The Specific Question

Fancifying the diagram we get:

Fancy like a pinky up drinking tea.

Is commit D a member of both branches or can we clearly decide whether it belongs to branch-A or branch-B?

Knowing what we now know, we can see that commit D is a member of both chains leading from the branch pointers to the root. Therefore we can say that D is a member of both branches.

Which branch started at E, which one at B?

Both branch-A and branch-B originate from the master branch at B, and diverge from each other at E. Git itself doesn't distinguish which branch owns E. At their core, the branches are just the chain of commits from newest to oldest ending up at the root.


1Fun Fact: The master branch is just an ordinary branch. It is no different from any other branch.

2The Pro Git book is licensed with the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.