We're running a central git repository (gforge) that everyone pulls from and pushes to. Unfortunately, some inept co-workers have decided that pushing several 10-100Mb jar files into the repo was a good idea. As a consequence of this, our server we use a lot has run out of disk space.
We only realised this when it was too late and most people had pulled the new huge repo. If the problem hadn't been pushed, then we could just do a rebase to snip out those huge commits and fix it, but now everyone has pulled from it, what is the best way to remove that commit (or do a rebase to just remove the large files) and then have this not cause chaos when everyone wants to pull/push from/to the repo?
It's supposed to be a small repo for scripts, but is now about 700M in size :-(
To remove the last commit from git, you can simply run git reset --hard HEAD^ If you are removing multiple commits from the top, you can run git reset --hard HEAD~2 to remove the last two commits. You can increase the number to remove even more commits.
If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.
The easiest way to avoid chaos is to give the server more disk.
This is a tough one. Removing the files requires removing them from the history, too, which can only be done with git filter-branch
. This command, for example, would remove <file>
from the history:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <file>' \
--prune-empty --tag-name-filter cat -- --all
The problem is this rewrites SHA1 hashes, meaning everyone on the team will need to reset to the new branch version or risk some serious headache. That's all fine and good if no one has work in progress and you all use topic branches. If you're more centralized, your team is large, or many of them keep dirty working directories while they work, there's no way to do this without a little bit of chaos and discord. You could spend quite a while getting everyone's local working correctly. That written, git filter-branch
is probably the best solution. Just make sure you've got a plan, your team understands it, and you make sure they back up their local repositories in case some vital work in progress gets lost or munged.
One possible plan would be:
git diff > ~/my_wip
.git format-patch <branch>
git filter-branch
. Make sure the team knows not to pull while this is happening.git fetch && git reset --hard origin/<branch>
or have them clone the repository afresh.git am <patch>
.git apply
, e.g. git apply ~/my_wip
.Check this out https://help.github.com/articles/remove-sensitive-data . Here they write about removing sensitive data from your Git repository but you can very well use it for removing the large files from your commits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With