Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapsing a git repository's history

We have a git project which has quite a big history.

Specifically, early in the project there were quite a lot of binary resource files in the project, these have now been removed as they're effectively external resources.

However, the size of our repository is >200MB (the total checkout is currently ~20MB) due to having these files previously committed.

What we'd like to do is "collapse" the history so that the repository appears to have been created from a later revision than it was. For example

1-----2-----3-----4-----+---+---+                    \       /                     +-----+---+---+ 
  1. Repository created
  2. Large set of binary files added
  3. Large set of binary files removed
  4. New intended 'start' of repository

So effectively we want to lose the project history before a certain point. At this point there is only one branch, so there's no complication with trying to deal with multiple start points etc. However we don't want to lose all of the history and start a new repository with the current version.

Is this possible, or are we doomed to have a bloated repository forever?

like image 361
Gareth Avatar asked Oct 30 '08 13:10

Gareth


People also ask

Which git command show all commits in the current branchs history?

I think an option for your purposes is git log --oneline --decorate . This lets you know the checked commit, and the top commits for each branch that you have in your story line. By doing this, you have a nice view on the structure of your repo and the commits associated to a specific branch.


2 Answers

You can remove the binary bloat and keep the rest of your history. Git allows you to reorder and 'squash' prior commits, so you can combine just the commits that add and remove your big binary files. If the adds were all done in one commit and the removals in another, this will be much easier than dealing with each file.

$ git log --stat       # list all commits and commit messages  

Search this for the commits that add and delete your binary files and note their SHA1s, say 2bcdef and 3cdef3.

Then to edit the repo's history, use rebase -i command with its interactive option, starting with the parent of the commit where you added your binaries. It will launch your $EDITOR and you'll see a list of commits starting with 2bcdef:

$ git rebase -i 2bcdef^    # generate a pick list of all commits starting with 2bcdef # Rebasing zzzzzz onto yyyyyyy  #  # Commands:  #  pick = use commit  #  edit = use commit, but stop for amending  #  squash = use commit, but meld into previous commit  #  # If you remove a line here THAT COMMIT WILL BE LOST. # pick 2bcdef   Add binary files and other edits pick xxxxxx   Another change   .   . pick 3cdef3   Remove binary files; link to them as external resources   .   . 

Insert squash 3cdef3 as the second line and remove the line which says pick 3cdef3 from the list. You now have a list of actions for the interactive rebase which will combine the commits which add and delete your binaries into one commit whose diff is just any other changes in those commits. Then it will reapply all of the subsequent commits in order, when you tell it to complete:

$ git rebase --continue 

This will take a minute or two.
You now have a repo that no longer has the binaries coming or going. But they will still take up space because, by default, Git keeps changes around for 30 days before they can be garbage-collected, so that you can change your mind. If you want to remove them now:

$ git reflog expire --expire=1.minute refs/heads/master       #all deletions up to 1 minute  ago available to be garbage-collected $ git fsck --unreachable      # lists all the blobs(files) that will be garbage-collected $ git prune $ git gc                       

Now you've removed the bloat but kept the rest of your history.

like image 154
Paul Avatar answered Oct 06 '22 20:10

Paul


You can use git filter-branch with grafts to make the commit number 4 the new root commit of your branch. Just create the file .git/info/grafts with just one line in it containing the SHA1 of commit number 4.

If you now do a git log or gitk you will see that those commands will display commit number 4 as the root of your branch. But nothing will have actually changed in your repository. You can delete .git/info/grafts and the output of git log or gitk will be as before. To actually make commit number 4 the new root you will have to run git filter-branch, with no arguments.

like image 21
davitenio Avatar answered Oct 06 '22 22:10

davitenio