so I'm working with some friends and we are all new to git and one of them committed a large amount of external binary files that slows down the repository, and takes up a large disk-space.
We've just started the project so there's nothing important in it really except a readme file. So what we'd like to do is to Clear the repository history to the current state.
So basicly it looks this:
Head -> A -> B -> C total disk size 45 MB, 1 file, 300 deleted files
And we want this:
Head -> A total disk size 1 kB, 1 file, 0 deleted files
The obvious solution would be to create a new repository and just copy the readme file into the new repository. However I'd like to learn for educational/curiosity if there's GIT command that can do this.
I've been experimenting with the Rebase command, but it seems like it still keeps old history and their data, which confuses me since if rebaseing doesnt prune data from the repository then you might aswell not use it.
I've been googling some other posts on this issue, and im suspecting that you can't do this with git. However I'd like to confirm that.
And yes it's a remote directory on github
Thanks for any help.
So for my solution i chose to do:
rebase using tortoisegit
squash all commits
then using git bash:
git reflog expire --all --expire-unreachable=now
git gc --aggressive --prune=now
git push origin master --force
It doesn't seem like the local repository history wants to shrink in disk size. However cloning the repository again shows the desired results and disk size. And the repository log does too.
Thanks for the helpful replies. Interesting Rebase seems very powerful.
Rebasing (git rebase -i --root
, if you didn't revert the bad commit just delete its line, if you did, squash the bad commit with the revert commit) or using filter-branch will clear the data from your branch's history, but won't make it disappear from the repository entirely.
This is because, for safety and tracability reasons, git keeps a reflog (visible with git log -g
) which tracks every commit you did, whether or not it's still part of the ancestry graph.
Cloning the filtered repo won't clone the hidden data, and you can also remove it in-place with these commands:
git reflog expire --all --expire-unreachable=now
git gc --aggressive --prune=now
Those commands aren't normally recommended and the unreferenced commits would expire in 30 days anyway, but since your repository is practically new you're not risking much.
You don't need to lose your history entirely. You can just rewrite it using filter-branch
. This is a pretty destructive command so make a copy first. This example will go through your history removing all jar
files.
git filter-branch --tree-filter 'git rm **/*.jar'
Adjust this to match whatever giant files were accidentally added. Note that modifying commits changes their ID so people will probably want to re-clone the repository after this, to avoid terrible conflicts. You will also need to --force
the push back to the repository as git will complain (rightly) that the history has changed a lot.
Your local repo may not immediately shrink in size until it decides to do garbage collection.
You may want to look at Squashing all Git commits into a single commit. That also references a stack overflow question--that might be called a duplicate--over here: How to squash all git commits into one?
The solution mentioned by Wincent in the first link is about halfway down the page. A quick test locally shows that it does work as advertised. For your reference, Wincent suggests:
git update-ref -d refs/heads/master
git commit -m "Initial import"
FWIW, you'll probably need to run git gc --prune=now
to clean up any unreferenced objects. And when you push up he new master, you'll need to use --force
. You should probably create a backup before trying any of this out. :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With