Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to preserve tags on git filter-branch for prune-empty or subdirectory filter

When rewriting the history by git filter-branch --tag-name-filter cat … either by using --prune-empty and/or --subdirectory-filter=… you'll get into the case, that the commits that where tagged are removed. That's reasonable so far and works as designed.

the question / goal

What I now want to archive is: preserve the tags on a nearby rewritten commit

example:

starting from A -> B(tag: foo) -> C -> D -> E

(where E is newer than D newer than C …)

running git filter-branch I get either

  • get A' -> B'(tag: foo)' -> E ( ^ the good case )

  • or: A' -> D' -> E' ( ^ the bad case )

What I'm trying to get then is: A'(tag: foo)' -> D' -> E' since A' represents what has been tagged in B

some research: first thing I stumpled over was git cherry in Git: Is there a way to figure out where a commit was cherry-pick'ed from? but this not seems to help very much to find the differences sind the trees are disjunct.

Instead, I already found a useful example of --commit-filter https://stackoverflow.com/a/14783391/529977 to write a log of the rewritten objects

some ideas: With that --commit-filter "mapping file" in mind, I would theoretically be able to

  1. filter all tags not set in the rewritten tree
    • can't find how to filter the tree for that information
  2. iterate the list of tags in doubt
  3. read the original commit point by git log --oneline -1 ${tag}
  4. lookup the history of the original tree for any newer commits that are known to be rewritten
    • forward lookups are hard too
    • alternativly go down the history from any rewritten commit to find the tag
  5. move the tag to the first match in the new tree
    • known problem: how to preserve all information, I do not want to retag the classic way
  6. skip tag, if there are only commits rewritten after another tag
    • how to determine a commit in question has a tag

other ideas I had were:

  • find any "similar" commit by comparing git log -1 --format="%an%ae%at%cn%ce%ct%s" | sha1sum in the original tree, then traverse history down to the next known tag but this comes close to the idea above

sounds still a hard way, even I don't have a good idea to solve these steps ... any other ideas or known solutions (?!) welcome!

like image 301
childno͡.de Avatar asked Aug 11 '17 10:08

childno͡.de


2 Answers

Deleted:           *    *         *                   *    *         *
Tags:              R    S    T    U                        V         W
Commits: A -> B -> C -> D -> E -> F -> G -> H -> I -> J -> K -> L -> M -> N

Expected output:

Tags:         R    T              V    W
Commits: A -> B -> E -> G -> H -> I -> L -> N

We will be testing this with --prune-empty so we are creating empty commits for the commits which should be deleted. Let's setup the test repository.

git init

touch n && git add n && git commit -m "N"
git commit --allow-empty -m "M"
touch l && git add l && git commit -m "L"
git commit --allow-empty -m "K"
git commit --allow-empty -m "J"
touch i && git add i && git commit -m "I"
touch h && git add h && git commit -m "H"
touch g && git add g && git commit -m "G"
git commit --allow-empty -m "F"
touch e && git add e && git commit -m "E"
git commit --allow-empty -m "D"
git commit --allow-empty -m "C"
touch b && git add b && git commit -m "B"
touch a && git add a && git commit -m "A"

git tag W $(git log --pretty=oneline --grep=M | cut -d " " -f1)
git tag V $(git log --pretty=oneline --grep=K | cut -d " " -f1)
git tag U $(git log --pretty=oneline --grep=F | cut -d " " -f1)
git tag T $(git log --pretty=oneline --grep=E | cut -d " " -f1)
git tag S $(git log --pretty=oneline --grep=D | cut -d " " -f1)
git tag R $(git log --pretty=oneline --grep=C | cut -d " " -f1)

To begin with we are going to create a file containing all the tag names and the commit hashes they point to.

for i in $(git tag); do echo $i; git log -1 --pretty=oneline $i | cut -d " " -f1; done > ../tags

When running git filter-branch the commit hashes will change. To keep track of those changes we create a file with mappings from the old commit hashes to the new commit hashes. The trick to do that is shown here.

The --subdirectory-filter=... command would then look like this:

git filter-branch --subdirectory-filter=... --commit-filter 'echo -n "${GIT_COMMIT}," >>/tmp/commap; git commit-tree "$@" | tee -a /tmp/commap'

Since the --prune-empty option conflicts with the --commit-filter we need to change something. The documentation of --prune-empty helps here:

Some filters will generate empty commits that leave the tree untouched. This option instructs git-filter-branch to remove such commits if they have exactly one or zero non-pruned parents; merge commits will therefore remain intact. This option cannot be used together with --commit-filter, though the same effect can be achieved by using the provided git_commit_non_empty_tree function in a commit filter.

So the --prune-empty command which we will be using for this test looks like this. Make sure that /tmp/commap doesn't exist or is empty before you run the command.

git filter-branch --commit-filter 'echo -n "${GIT_COMMIT}," >>/tmp/commap; git_commit_non_empty_tree "$@" | tee -a /tmp/commap'
mv /tmp/commap ../commap

Now we ran git filter-branch and gathered all the information needed to deal with the tags. We will have to delete tags and we will have to change the commit tags point to. We are lucky here, git stores the commit hash a tag points to simply in .git/refs/tags/TAGNAME.

Now what's left is to write a script to automatically correct the tags. Here is what I wrote in Python.

def delete(tagname):
    print('git tag -d {}'.format(tagname))

def move(tagname, tagref):
    print('echo "{}" > .git/refs/tags/{}'.format(tagref, tagname))

tags = {}
with open('tags') as tagsfile:
    for i, line in enumerate(tagsfile):
        if i%2 == 0:
            tagname = line[:-1]
        else:
            # if there are multiple tags on one commit
            # we discard all but one
            tagref = line[:-1]
            if tagref in tags:
                delete(tags[tagref])
            tags[tagref] = tagname

commap = []
with open('commap') as commapfile:
    for line in commapfile:
        old, new = line[:-1].split(',')
        commap.append((old, new))

lastnew = None
takentag = None
for old, new in commap:
    if old in tags:
        if takentag:
            delete(takentag)
        takentag = tags[old]
    if new != lastnew:
        # commit was not deleted
        if takentag:
            move(takentag, new)
            takentag = None
    lastnew = new

The script output the commands needed to adjust the tags. In our example this is the output:

echo "0593fe3aa7a50d41602697f51f800d34b9887ba3" > .git/refs/tags/W
echo "93e65edf18ec8e33e5cc048e87f8a9c5270dd095" > .git/refs/tags/V
git tag -d U
echo "41d9e45de069df2c8f2fdf9ba1d2a8b3801e49b2" > .git/refs/tags/T
git tag -d S
echo "a0c4c919f841295cfdb536fcf8f7d50227e8f062" > .git/refs/tags/R

After pasting the commands to the console the git repository looks as expected:

$ git log
8945e933c1d8841ffee9e0bca1af1fce84c2977d A
a0c4c919f841295cfdb536fcf8f7d50227e8f062 B
41d9e45de069df2c8f2fdf9ba1d2a8b3801e49b2 E
6af1365157d705bff79e8c024df544fcd24371bb G
108ddf9f5f0a8c8d1e17042422fdffeb147361f2 H
93e65edf18ec8e33e5cc048e87f8a9c5270dd095 I
0593fe3aa7a50d41602697f51f800d34b9887ba3 L
5200d5046bc92f4dbe2aee4d24637655f2af5d62 N
$ git tag
R
T
V
W
$ git log -1 --pretty=oneline R
a0c4c919f841295cfdb536fcf8f7d50227e8f062 B
$ git log -1 --pretty=oneline T
41d9e45de069df2c8f2fdf9ba1d2a8b3801e49b2 E
$ git log -1 --pretty=oneline V
93e65edf18ec8e33e5cc048e87f8a9c5270dd095 I
$ git log -1 --pretty=oneline W
0593fe3aa7a50d41602697f51f800d34b9887ba3 L
like image 166
timakro Avatar answered Nov 09 '22 11:11

timakro


I found using git_commit_non_empty_tree unreliable. Another approach, which is relatively simple, is to re-apply the tags to the first occurrence of the tree hash. This is not the `correct' answer in the presence of back-outs, but might actually be desirable, depending on your use case.

for tag in $(git tag)
do
  t=$(git rev-parse $tag^{tree})
  r=$(git log --format='%T %H' | grep "^$t" | tail -n 1 | sed -e 's/.* //')
  git tag -f $tag $r
done

The git log can obviously be cached. This needs to be done after a filter-branch without --prune-empty and then run

git filter-branch --prune-empty --tag-name-filter cat -- --all

to remove the empty commits. This only works for lightweight tags, but if you're using filtering you probably want to convert annotated tags to lightweight ones first and then reapply them at the end.

like image 24
Jeremy Avatar answered Nov 09 '22 09:11

Jeremy