Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove large commits from git

We're running a central git repository (gforge) that everyone pulls from and pushes to. Unfortunately, some inept co-workers have decided that pushing several 10-100Mb jar files into the repo was a good idea. As a consequence of this, our server we use a lot has run out of disk space.

We only realised this when it was too late and most people had pulled the new huge repo. If the problem hadn't been pushed, then we could just do a rebase to snip out those huge commits and fix it, but now everyone has pulled from it, what is the best way to remove that commit (or do a rebase to just remove the large files) and then have this not cause chaos when everyone wants to pull/push from/to the repo?

It's supposed to be a small repo for scripts, but is now about 700M in size :-(

like image 437
agentgonzo Avatar asked Jul 09 '12 14:07

agentgonzo


People also ask

How do I remove unnecessary commits?

To remove the last commit from git, you can simply run git reset --hard HEAD^ If you are removing multiple commits from the top, you can run git reset --hard HEAD~2 to remove the last two commits. You can increase the number to remove even more commits.

Can you remove a file from git history?

If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.


2 Answers

The easiest way to avoid chaos is to give the server more disk.

This is a tough one. Removing the files requires removing them from the history, too, which can only be done with git filter-branch. This command, for example, would remove <file> from the history:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch <file>' \
--prune-empty --tag-name-filter cat -- --all

The problem is this rewrites SHA1 hashes, meaning everyone on the team will need to reset to the new branch version or risk some serious headache. That's all fine and good if no one has work in progress and you all use topic branches. If you're more centralized, your team is large, or many of them keep dirty working directories while they work, there's no way to do this without a little bit of chaos and discord. You could spend quite a while getting everyone's local working correctly. That written, git filter-branch is probably the best solution. Just make sure you've got a plan, your team understands it, and you make sure they back up their local repositories in case some vital work in progress gets lost or munged.

One possible plan would be:

  1. Get the team to generate patches of their work in progress, something like git diff > ~/my_wip.
  2. Get the team to generate patches for their committed but unshared work: git format-patch <branch>
  3. Run git filter-branch. Make sure the team knows not to pull while this is happening.
  4. Have the team issue git fetch && git reset --hard origin/<branch> or have them clone the repository afresh.
  5. Apply their previously committed work with git am <patch>.
  6. Apply their work in progress with git apply, e.g. git apply ~/my_wip.
like image 186
Christopher Avatar answered Sep 30 '22 17:09

Christopher


Check this out https://help.github.com/articles/remove-sensitive-data . Here they write about removing sensitive data from your Git repository but you can very well use it for removing the large files from your commits.

like image 20
Sankha Narayan Guria Avatar answered Sep 30 '22 17:09

Sankha Narayan Guria