Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I rewrite an entire git repository's history to include something we forgot?

We recently completed a conversion from Mercurial to Git, everything went smoothly, we were even able to get the transforms needed to make everything look / work relatively correctly in the repository. We added a .gitignore and got underway.

However we're experiencing some extreme slowdowns as soon as we encorporate/work with any of our old feature branches. A little exploring and we found that since the .gitignore was only added to the develop branch when we look at other commits without merging develop up into them git chuggs because it's choking trying to analyze all our build artifacts (binary files) etc... since there was no .gitignore file for these old branches.

What we'd like to do is effectively insert a new root commit with the .gitignore so it would retroactively populate in all heads/tags. We're comfortable with a re-write of history, our team is relatively small so getting everyone to halt for this operation and re-pull thier repositories when the history re-write is done is no problem.

I've found information about rebasing master onto a new root commit and this works for master, the problem is it leaves our feature branches detached on the old history tree, it also replays the entire history with a new commit date/time.

Any ideas or are we out of luck on this one?

like image 925
Aren Avatar asked Jan 13 '15 17:01

Aren


1 Answers

What you want to do will involve two phases: retroactively add a new root with a suitable .gitignore and scrub your history to remove files that should not have been added. The git filter-branch command can do both.

Setup

Consider a representative of your history.

$ git lola --name-status
* f1af2bf (HEAD, bar-feature) Add bar
| A     .gitignore
| A     bar.c
| D     main.o
| D     module.o
| * 71f711a (master) Add foo
|/
|   A   foo.c
|   A   foo.o
* 7f1a361 Commit 2
| A     module.c
| A     module.o
* eb21590 Commit 1
  A     main.c
  A     main.o

For clarity, the *.c files represent C source files and *.o are compiled object files that should have been ignored.

On the bar-feature branch, you added a suitable .gitignore and deleted object files that should not have been tracked, but you want that policy reflected everywhere in your import.

Note that git lola is a non-standard but useful alias.

git config --global alias.lola \
  'log --graph --decorate --pretty=oneline --abbrev-commit --all'

New Root Commit

Create the new root commit as follows.

$ git checkout --orphan new-root
Switched to a new branch 'new-root'

The git checkout documentation notes a possible unanticipated state of the new orphan branch.

If you want to start a disconnected history that records a set of paths that is totally different from the one of start_point, then you should clear the index and the working tree right after creating the orphan branch by running git rm -rf . from the top level of the working tree. Afterwards you will be ready to prepare your new files, repopulating the working tree, by copying them from elsewhere, extracting a tarball, etc.

Continuing our example:

$ git rm -rf .
rm 'foo.c'
rm 'foo.o'
rm 'main.c'
rm 'main.o'
rm 'module.c'
rm 'module.o'

$ echo '*.o' >.gitignore

$ git add .gitignore

$ git commit -m 'Create .gitignore'
[new-root (root-commit) 00c7780] Create .gitignore
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore

Now the history looks like

$ git lola
* 00c7780 (HEAD, new-root) Create .gitignore
* f1af2bf(bar-feature) Add bar
| * 71f711a (master) Add foo
|/
* 7f1a361 Commit 2
* eb21590 Commit 1

That is slightly misleading because it makes new-root look like it is a descendant of bar-feature, but it really has no parent.

$ git rev-parse HEAD^
HEAD^
fatal: ambiguous argument 'HEAD^': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Make note of the SHA for the orphan because you will need it later. In this example, it is

$ git rev-parse HEAD
00c778087723ae890e803043493214fb09706ec7

Rewriting History

We want git filter-branch to make three broad changes.

  1. Splice in the new root commit.
  2. Delete all the temporary files.
  3. Use the .gitignore from the new root unless one already exists.

On the command line, that is incanted as

git filter-branch \
  --parent-filter '
    test $GIT_COMMIT = eb215900cd15ca2cf9ded74f1a0d9d25f65eb2bf && \
              echo "-p 00c778087723ae890e803043493214fb09706ec7" \
      || cat' \
  --index-filter '
    git rm --cached --ignore-unmatch "*.o"; \
    git ls-files --cached --error-unmatch .gitignore >/dev/null 2>&1 ||
      git update-index --add --cacheinfo \
        100644,$(git rev-parse new-root:.gitignore),.gitignore' \
  --tag-name-filter cat \
  -- --all

Explanation:

  • The --parent-filter option hooks in your new root commit.
    • eb215... is the full SHA of the old root commit, cf. git rev-parse eb215
  • The --index-filter option has two parts:
    • Running git rm as above deletes anything matching *.o from the entire tree because the glob pattern is quoted and interpreted by git rather than the shell.
    • Check for an existing .gitignore with git ls-files, and if it is not there, point to the one in new-root.
  • If you have any tags, they will be mapped over with the identity operation, cat.
  • The lone -- terminates options, and --all is shorthand for all refs.

The output you see will resemble

Rewrite eb215900cd15ca2cf9ded74f1a0d9d25f65eb2bf (1/5)rm 'main.o'
Rewrite 7f1a361ee918f7062f686e26b57788dd65bb5fe1 (2/5)rm 'main.o'
rm 'module.o'
Rewrite 71f711a15fa1fc60542cc71c9ff4c66b4303e603 (3/5)rm 'foo.o'
rm 'main.o'
rm 'module.o'
Rewrite f1af2bf89ed2236fdaf2a1a75a34c911efbd5982 (5/5)
Ref 'refs/heads/bar-feature' was rewritten
Ref 'refs/heads/master' was rewritten
WARNING: Ref 'refs/heads/new-root' is unchanged

Your originals are still safe. The master branch now lives under refs/original/refs/heads/master, for example. Review the changes in your newly rewritten branches. When you are ready to delete the backup, run

git update-ref -d refs/original/refs/heads/master

You could cook up a command to cover all backup refs in one command, but I recommend careful review for each one.

Conclusion

Finally, the new history is

$ git lola --name-status
* ab8cb1c (bar-feature) Add bar
| M     .gitignore
| A     bar.c
| * 43e5658 (master) Add foo
|/
|   A   foo.c
* 6469dab Commit 2
| A     module.c
* 47f9f73 Commit 1
| A     main.c
* 00c7780 (HEAD, new-root) Create .gitignore
  A     .gitignore

Observe that all the object files are gone. The modification to .gitignore in bar-feature is because I used different contents to make sure it would be preserved. For completeness:

$ git diff new-root:.gitignore bar-feature:.gitignore
diff --git a/new-root:.gitignore b/bar-feature:.gitignore
index 5761abc..c395c62 100644
--- a/new-root:.gitignore
+++ b/bar-feature:.gitignore
@@ -1 +1,2 @@
 *.o
+*.obj

The new-root ref is no longer useful, so dispose of it with

$ git checkout master
$ git branch -d new-root
like image 65
Greg Bacon Avatar answered Sep 21 '22 17:09

Greg Bacon