Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean up large files on git server

Tags:

git

gitlab

Someone accidentally committed some large (multi-GB) binaries to my self-hosted gitlab repository, and now every time someone tries to pull from the repository the server gets hit really hard.

I tried removing any reference to the files via force push, but it still seems to impact the server. Is there a way to force the gitlab server to get rid of it?

I read up on some stuff like filter-branch but I'm not sure what that would do to a bare repo or how I'd even use it on a commit I no longer have a reference to.

Update: For reference, these types of messages are appearing on the gitlab VM's console:

[ 5099.922896] Out of memory: kill process 6200 (git-upload-pack) score 1053982 or a child
[ 5099.922908] Killed process 6202 (git)
[ 5099.930796] Out of memory: kill process 6200 (git-upload-pack) score 360394 or a child
[ 5099.930807] Killed process 6203 (git)
[ 5099.938875] Out of memory: kill process 6200 (git-upload-pack) score 360394 or a child
[ 5099.938886] Killed process 6203 (git)
[ 5099.951163] Out of memory: kill process 6139 (git-upload-pack) score 324327 or a child
[ 5099.951174] Killed process 6151 (git)
like image 567
Karl Avatar asked Aug 11 '15 03:08

Karl


2 Answers

As the OP Karl confirms in the comments, running BFG repo cleaner on the server side (directly in the bare repo) is enough to remove the large binaries.

If you follow that with (as mentioned in "Git - Delete a Blob"):

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

But also ("git gc --aggressive vs git repack"):

git gc
git repack -Ad      # kills in-pack garbage
git prune           # kills loose garbage

You should end up with a slimmer and smaller bare repo.

like image 200
VonC Avatar answered Oct 05 '22 07:10

VonC


To do this, you will break the history of the repositories of any one that had pushed from this commit. You will have to tell them.

What you need is to rebase your remote repository and remove this commit.

First, rebase in your repository.

git rebase -i problematicCommit~1

This will open your default editor. Remove the line of the commit problematicCommit. Save the file and close it.

Remove the branch in your remote repository.

git push origin :nameOfTheBranch

Look the dots before the name of the branch.

Finally, create again the branch in the remote.

git push origin nameOfTheBranch

This regenerate the branch in the remote without the conflictive commit and the new clones will be fast again.

Now, If you still notice that your repository is going slow. You can erase the untracked objects (e.g. the ones with this big file) that it has.

First, remove all tags, branches that could be pointing to the old commits. This is important because to be able to erase old commits, they must be untracked.

Then, following the VonC comment stackoverflow.com/a/28720432/6309 - Do in your repository and in the remote:

git gc
git repack -Ad
git prune
like image 36
blashser Avatar answered Oct 05 '22 05:10

blashser