Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to automatically squash git repository history, without conflicts, in order to shrink it?

Tags:

git

squash

I have a repository that has grown too big to the point it has become unusable. Basically my repository is over 2GB and takes too long to clone. I now want to shrink it, but still be able to go back to some specific old versions... Shrinking will involve rewriting history, so i m fine with that. People with clones will have to rebase/cherrypick/copyfiles on top of new branch in new repo clone.

  • I have binary files in this repository but I need them there ( think of it as mandatory resource for the software to run ). So I cannot really use filter-branch or BFG to remove some big binary files, since i may need them when reverting to past commits.
  • I do not care of previous old/already merged branches ( example : features branches ), but I care about some specific commits ( example heads of past release branches )
  • Since I ll be modifying (~many~) very old commits, I have no idea now of how to solve properly merge conflicts ( as can happen with basic rebase/cherrypick ) so I m looking for a solution that doesnt produce any conflicts, or produces only conflicts that can be solved automatically.
  • I want to preserve all current branches, so people who have work going on on a clone can rebase/copychanges on them.
  • I want to have relevant history between my new commits to match the history from the old repo ( as if the commits were squashed ). The current branches' history will start from one of these old squashed commits.

I think of it as a squash of unneeded old repository history. What I came up so far as a possible process for my case ( I miss some steps and I am still unsure this will do what I think ) is :

  • clone a mirror of the existing repo.
  • Create orphan branches from the old commits I want to keep. This will create parentless squashed commits with all files needed in them.
  • Somehow link them to recreate old repo history => How ? merge / rebase / reset+commit orphans ?
  • Cherrypick each current branch's commit list (using intervals), and applying them to the latest commit that squashed the parent of their first divergent commit => How to automatically find which commit to apply a cherry picked commit interval to ? Will that work without conflicts ?
  • Move tags to the new tree. Remove previous tree. git garbage collect.

Is this doable / feasible without any conflicts ? Will this work in any kind of cases ( git commit tree can be pretty complex ) ? Any better solution to safely and automatically squash history ?

It seems to me this type of maintenance task is something that will happen for a long running project, so I'm assuming other big projects already used some type of solution. But I guess there could be an option to git init ( or another command ) that I am not aware of, to create a new repo from an old repo for this usecase ?

Update : I found a beginning of solution here : https://wincent.com/wiki/Editing,_amending,_or_squashing_the_root_commit_in_a_Git_repository But I would like to do this multiple times into my history, in a fully automatic way (ie without conflicts)...

like image 935
Asmodehn Avatar asked Nov 26 '25 04:11

Asmodehn


1 Answers

You can clone just a part of the repo:

git clone --depth depth 

This is called a shallow clone.

The was a post on the Atlassian blog a while ago that offers other strategies for dealing with a large repo.

like image 58
Richard Hulse Avatar answered Nov 28 '25 22:11

Richard Hulse



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!