I was trying to create a linear order from "git log" output, but all my attempts failed. What I need to do is map a commit to the next release that contains that commit. I cannot run
git tag --contains <commit>
for each commit, as our repository contains an extremely large amount of commits (more than 300,000).
First I tried using
git log --pretty=format:"%ct%H" | sort --key=1,10
to obtain a linear order based on commit time. However, this does not seem to produce an 100% accurate result. This leads to my first question:
Q1) How does git store commit times, when commits are pushed into the main repository? Does it store the current machine time for each commit, in UTC?
I also looked at "git help log", and the documentation states that by default, git log lists the commits in chronological order. In my project, I checked whether I was introducing any error, but as far I can tell, the code is correct, and the chronological order given by git log is not a linear order. Finally, my question is?
Q2) How can one obtain a linear order from "git log", given that git does not store revision numbers?
Thanks :)
From man git-commit
:
Git internal format
It is [unix timestamp] [timezone offset], where [unix timestamp] is the number of seconds since the UNIX epoch. [timezone offset] is a positive or negative offset from UTC.
Based on this, the git internally-used time-format is UNIX epoch time, including the machines UTC offset.
The method you've used (git log --pretty=format:"%ct%H"
) will pull data from all branches that have been merged into the current branch.
This makes a "linear order" somewhat difficult. Consider the following [source: git-scm.org]:
So, here we've got several 'topic branches' being worked on. We then decide to keep some (dumbidea
and iss91v2
), discarding others (iss91
). So we're discarding C5
and C6
, keeping the other commits, and our post-merge history looks like this [source: git-scm.org]:
(Arrows point from children
to parents
; C14
is the child of commits C13
and C11
).
So now we've got a single HEAD
commit which, for arguments sake, we'll assume we're going to release as a RELEASE1
or something. So, to the question: how can we now, having this history, extract a linear, chronologically correct list of commits?
Simple answer: I don't believe you can - or, if you do, I don't believe it'll be what you want.
You could sort the commits linearly, by time:
git log --pretty=format:"%ct %H" | sort --key=1,10
That'll give you a list corresponding to:
C1
C2
... snip ...
C13
C14
Note, however, that this isn't actually a linear history! This is because we've merged some branches together, which were created at the same time. We can't extract a linear history of the parents of C14
(our HEAD
), because there isn't one - it's the child of two branches, not the child of a single commit, and that isn't a linear relationship.
So, you argue, perhaps I could get a linear history of just one branch? C14 -> C13 ... C3 -> C1
, for example?
This, too, is at minimum very difficult and (more likely) impossible.
This problem is compounded when we've got multiple branches joining (3- or more-way merges). This question goes into some more detail of the reasons you can't extract history of a 'single branch' - when you're looking at the parents of a merge-commit, how do you decide which is the 'single branch' and which is the 'joining-in' branch?
Having said all that, if you examine, for example, the logs for this little repository, in graph format: (I did snip a few commits that weren't useful)
zsh% git log --graph --all --format=format:'%C(blue)%h%C(reset) - %C(green)(%cr)%C(reset) %C(yellow)%d%C(reset)' --abbrev-commit --date=relative
* 3cf5f06 - (8 weeks ago) (origin/master, origin/HEAD, master)
* a3a3205 - (4 months ago)
* c033bf9 - (4 months ago) (origin/svg)
* ccee435 - (4 months ago)
* f08bc1e - (4 months ago)
|\
| * 48c4406 - (5 months ago)
* | 203eeaa - (4 months ago)
* | 5fb0ea9 - (5 months ago)
|/
* 39bccb8 - (5 months ago)
Note that this history is in chronological order; the branches haven't been 'flattened' into one, though, so it looks a little funky. Each of these commits is contained in the current HEAD
(master
, origin/master
). This is obvious, because both of the forks in the history have been merged together (the merge is at f08bc1e
).
If you're interested in an individual commit, this question, or if your releases are tagged will help.
Reading the question, it appears you may want to map each commit to a release; that's a lot of work, and I can't help much with that - I don't think you need to check each commit, though, because branches will be merged in, and if the head of a linear branch is in the release, the linear parents will also be. Unless you've done cherry-picking, or similar.
If you sorted by time, then checked all commits older than your oldest release, recording that commit ID if it was included in the oldest, then the second oldest, etc, and removing the commit from the list when you find a release that contains it, you'll have to check at most number of releases
* number of commits
; at worst case, no commit is in any release. Best case, releases contain every commit older than itself, which is 300,000
checks. Still a lot, but (to my naive mind), doable.
(Apologies for the long reply).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With