Incremental linearizing of git DAG

Question

I'm the author of GitX. One of the features GitX has is the visualization of branches, as can be seen here.

This visualization is currently done by reading commits which are emitted from git in the correct order. For each commit the parents are known, so it's fairly easy to build up the lanes in the correct way.

I'd like to speed up this process by using my own commit pool and linearizing the commits myself. This allows me to reuse existing loaded commits and allows git to emit commits faster because it doesn't have to emit them in the correct order.

However, I'm not sure what algorithm to use to accomplish this. It is important that the building is incremental, as the loading of commits can take a long time (>5 seconds for 100,000 commits, which should all be displayed).

Gitk has gone the same way, and there's a patch here that shows how it is implemented, but my TCL skills are weak and the patch isn't very thoroughly commented and a bit hard to follow.

I'd also like this algorithm to be efficient, as it'll have to handle hundreds of thousands of commits. It also has to be displayed in a table, so it's important that access to specific rows is fast.

I'll describe the input I have so far, the output that I want and a few observations.

Input:

I have a current pool of commits in the form of a hash table that maps commit ids to commit objects. This pool does not have to be complete (have all commits necessary)
I have a separate thread loading in new commits from git, with a callback that can be called every time a new commit is loaded. There is no guaranteed order in which the commits come in, but in most of the cases the next commit is a parent of the previous commit.
A commit object has its own revision id and the revision ids of all its parents
I have a list of branch heads that should be listed. That is, there isn't a single 'top' of the DAG that should be displayed. There also does not have to be a single graph root.

Output:

I'll need to linearize these commits in topological order. That is, a commit cannot be listed after its parents have been listed.
I also need the 'branch lines' that can be seen in the screenshot above. These probably need to be precomputed as most of them depend on their children.

A few remarks:

It's necessary to relocate a list of commits. For example, we might have to commits (branch heads) that are unrelated, until a commit shows up which makes one head an ancestor of the other.
Multiple branch tips must be shown
It's important that this process is incremental, so that at least a partial view is available while the data is still loading. This means that new data has to be inserted halfway and that the branch lines have to be readjusted.

obecalp · Accepted Answer

The standard topological sort is O(n) (OK, O(V+E)), i.e. you should be able to sort a million commits in memory in a fraction of a second. No incremental hack like those in Tcl is needed.

BTW, I use GitX (looks much better than Gitk on OS X) everyday and don't have any issue with it (maybe because I don't have those crazy merges in my repositories) :)

Incremental linearizing of git DAG

Tags:

git

visualization

objective-c

directed-acyclic-graphs

Pieter

1 Answers

obecalp

Recent Activity

Donate For Us

Incremental linearizing of git DAG

Tags:

git

visualization

objective-c

directed-acyclic-graphs

Pieter

1 Answers

obecalp

Related questions

Recent Activity

Donate For Us