Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove large .pack file created by git

People also ask

How do I reduce the size of a .pack file in Git?

When you do a Git clone, it will create a copy of the whole repository, this includes the pack file as this is part of the repo too. The only way to reduce the size of the pack file will be by removing contents from your repo.

How do I delete large files in git?

If the large file was added in the most recent commit, you can just run: git rm --cached <filename> to remove the large file, then. git commit --amend -C HEAD to edit the commit.

What is git .pack file?

The packfile is a single file containing the contents of all the objects that were removed from your filesystem. The index is a file that contains offsets into that packfile so you can quickly seek to a specific object.


The issue is that, even though you removed the files, they are still present in previous revisions. That's the whole point of git, is that even if you delete something, you can still get it back by accessing the history.

What you are looking to do is called rewriting history, and it involved the git filter-branch command.

GitHub has a good explanation of the issue on their site. https://help.github.com/articles/remove-sensitive-data

To answer your question more directly, what you basically need to run is this command with unwanted_filename_or_folder replaced accordingly:

git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch unwanted_filename_or_folder' --prune-empty

This will remove all references to the files from the active history of the repo.

Next step, to perform a GC cycle to force all references to the file to be expired and purged from the packfile. Nothing needs to be replaced in these commands.

git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
# or, for older git versions (e.g. 1.8.3.1) which don't support --stdin
# git update-ref $(git for-each-ref --format='delete %(refname)' refs/original)
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Scenario A: If your large files were only added to a branch, you don't need to run git filter-branch. You just need to delete the branch and run garbage collection:

git branch -D mybranch
git reflog expire --expire-unreachable=all --all
git gc --prune=all

Scenario B: However, it looks like based on your bash history, that you did merge the changes into master. If you haven't shared the changes with anyone (no git push yet). The easiest thing would be to reset master back to before the merge with the branch that had the big files. This will eliminate all commits from your branch and all commits made to master after the merge. So you might lose changes -- in addition to the big files -- that you may have actually wanted:

git checkout master
git log # Find the commit hash just before the merge
git reset --hard <commit hash>

Then run the steps from the scenario A.

Scenario C: If there were other changes from the branch or changes on master after the merge that you want to keep, it would be best to rebase master and selectively include commits that you want:

git checkout master
git log # Find the commit hash just before the merge
git rebase -i <commit hash>

In your editor, remove lines that correspond to the commits that added the large files, but leave everything else as is. Save and quit. Your master branch should only contain what you want, and no large files. Note that git rebase without -p will eliminate merge commits, so you'll be left with a linear history for master after <commit hash>. This is probably okay for you, but if not, you could try with -p, but git help rebase says combining -p with the -i option explicitly is generally not a good idea unless you know what you are doing.

Then run the commands from scenario A.


As loganfsmyth already stated in his answer, you need to purge git history because the files continue to exist there even after deleting them from the repo. Official GitHub docs recommend BFG which I find easier to use than filter-branch:

Deleting files from history

Download BFG from their website. Make sure you have java installed, then create a mirror clone and purge history. Make sure to replace YOUR_FILE_NAME with the name of the file you'd like to delete:

git clone --mirror git://example.com/some-big-repo.git
java -jar bfg.jar --delete-files YOUR_FILE_NAME some-big-repo.git
cd some-big-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push

Delete a folder

Same as above but use --delete-folders

java -jar bfg.jar --delete-folders YOUR_FOLDER_NAME some-big-repo.git

Other options

BFG also allows for even fancier options (see docs) like these:

Remove all files bigger than 100M from history:

java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git

Important!

When running BFG, be careful that both YOUR_FILE_NAME and YOUR_FOLDER_NAME are indeed just file/folder names. They're not paths, so something like foo/bar.jpg will not work! Instead all files/folders with the specified name will be removed from repo history, no matter which path or branch they existed.


Run the following command, replacing PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA with the path to the file you want to remove, not just its filename. These arguments will:

  1. Force Git to process, but not check out, the entire history of every branch and tag
  2. Remove the specified file, as well as any empty commits generated as a result
  3. Overwrite your existing tags
git filter-branch --force --index-filter "git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" --prune-empty --tag-name-filter cat -- --all

This will forcefully remove all references to the files from the active history of the repo.

Next step, to perform a GC cycle to force all references to the file to be expired and purged from the pack file. Nothing needs to be replaced in these commands.

git update-ref -d refs/original/refs/remotes/origin/master
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --aggressive --prune=now

One option:

run git gc manually to condense a number of pack files into one or a few pack files. This operation is persistent (i.e. the large pack file will retain its compression behavior) so it may be beneficial to compress a repository periodically with git gc --aggressive

Another option is to save the code and .git somewhere and then delete the .git and start again using this existing code, creating a new git repository (git init).


I am a little late for the show but in case the above answer didn't solve the query then I found another way. Simply remove the specific large file from .pack. I had this issue where I checked in a large 2GB file accidentally. I followed the steps explained in this link: http://www.ducea.com/2012/02/07/howto-completely-remove-a-file-from-git-history/