Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git and log order

Tags:

git

I was trying to create a linear order from "git log" output, but all my attempts failed. What I need to do is map a commit to the next release that contains that commit. I cannot run

git tag --contains <commit>

for each commit, as our repository contains an extremely large amount of commits (more than 300,000).

First I tried using

git log --pretty=format:"%ct%H" | sort --key=1,10 

to obtain a linear order based on commit time. However, this does not seem to produce an 100% accurate result. This leads to my first question:

Q1) How does git store commit times, when commits are pushed into the main repository? Does it store the current machine time for each commit, in UTC?

I also looked at "git help log", and the documentation states that by default, git log lists the commits in chronological order. In my project, I checked whether I was introducing any error, but as far I can tell, the code is correct, and the chronological order given by git log is not a linear order. Finally, my question is?

Q2) How can one obtain a linear order from "git log", given that git does not store revision numbers?

Thanks :)

like image 403
leco Avatar asked Jun 04 '12 18:06

leco


1 Answers

#1: How does git store commit times, when commits are pushed into the main repository? Does it store the current machine time for each commit, in UTC?

From man git-commit:

Git internal format
It is [unix timestamp] [timezone offset], where [unix timestamp] is the number of seconds since the UNIX epoch. [timezone offset] is a positive or negative offset from UTC.

Based on this, the git internally-used time-format is UNIX epoch time, including the machines UTC offset.

#2: How can one obtain a linear order from "git log", given that git does not store revision numbers?

The method you've used (git log --pretty=format:"%ct%H") will pull data from all branches that have been merged into the current branch.

This makes a "linear order" somewhat difficult. Consider the following [source: git-scm.org]:

Multiple branches, pre-merge

So, here we've got several 'topic branches' being worked on. We then decide to keep some (dumbidea and iss91v2), discarding others (iss91). So we're discarding C5 and C6, keeping the other commits, and our post-merge history looks like this [source: git-scm.org]:

Post-merge

(Arrows point from children to parents; C14 is the child of commits C13 and C11).

So now we've got a single HEAD commit which, for arguments sake, we'll assume we're going to release as a RELEASE1 or something. So, to the question: how can we now, having this history, extract a linear, chronologically correct list of commits?

Simple answer: I don't believe you can - or, if you do, I don't believe it'll be what you want.

You could sort the commits linearly, by time:

git log --pretty=format:"%ct %H" | sort --key=1,10

That'll give you a list corresponding to:

C1
C2
... snip ...
C13
C14

Note, however, that this isn't actually a linear history! This is because we've merged some branches together, which were created at the same time. We can't extract a linear history of the parents of C14 (our HEAD), because there isn't one - it's the child of two branches, not the child of a single commit, and that isn't a linear relationship.

So, you argue, perhaps I could get a linear history of just one branch? C14 -> C13 ... C3 -> C1, for example?

This, too, is at minimum very difficult and (more likely) impossible.

This problem is compounded when we've got multiple branches joining (3- or more-way merges). This question goes into some more detail of the reasons you can't extract history of a 'single branch' - when you're looking at the parents of a merge-commit, how do you decide which is the 'single branch' and which is the 'joining-in' branch?


Having said all that, if you examine, for example, the logs for this little repository, in graph format: (I did snip a few commits that weren't useful)


zsh% git log --graph --all --format=format:'%C(blue)%h%C(reset) - %C(green)(%cr)%C(reset)                %C(yellow)%d%C(reset)' --abbrev-commit --date=relative
* 3cf5f06 - (8 weeks ago)  (origin/master, origin/HEAD, master)
* a3a3205 - (4 months ago) 
* c033bf9 - (4 months ago)  (origin/svg)
* ccee435 - (4 months ago) 
*   f08bc1e - (4 months ago) 
|\  
| * 48c4406 - (5 months ago) 
* | 203eeaa - (4 months ago) 
* | 5fb0ea9 - (5 months ago) 
|/  
* 39bccb8 - (5 months ago)

Note that this history is in chronological order; the branches haven't been 'flattened' into one, though, so it looks a little funky. Each of these commits is contained in the current HEAD (master, origin/master). This is obvious, because both of the forks in the history have been merged together (the merge is at f08bc1e).

#3: What I need to do is map a commit to the next release that contains that commit

If you're interested in an individual commit, this question, or if your releases are tagged will help.

Reading the question, it appears you may want to map each commit to a release; that's a lot of work, and I can't help much with that - I don't think you need to check each commit, though, because branches will be merged in, and if the head of a linear branch is in the release, the linear parents will also be. Unless you've done cherry-picking, or similar.

If you sorted by time, then checked all commits older than your oldest release, recording that commit ID if it was included in the oldest, then the second oldest, etc, and removing the commit from the list when you find a release that contains it, you'll have to check at most number of releases * number of commits; at worst case, no commit is in any release. Best case, releases contain every commit older than itself, which is 300,000 checks. Still a lot, but (to my naive mind), doable.

(Apologies for the long reply).

like image 59
simont Avatar answered Oct 01 '22 06:10

simont