Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removed directory and its files from git history but PACK file still contains files

Tags:

git

I wanted to remove a directory and its contents from the history of a git repository to reduce the size of this git repository. (The directory contained binary assets such as models and textures and contributed by far the most to the size of the git repository.)

I used the following solution to a previous question:

git filter-branch --tree-filter 'rm -rf the_directory' --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo the_directory/ >> .gitignore
git add .gitignore
git commit -m 'Removing the_directory from git history'
git gc
git push origin master --force

This seemed to have worked because I cannot find any references anymore to this directory and its content in my commit history on Github. (I have above 1500 commits and the directory was always there but isn't anymore. I even cannot find the commit anymore in which I explicitly deleted the directory (from the repository but not from the history).)

Unfortunately, the size of the repository was not changed according to Github. I still have a PACK file of 450MB (while the actual repository is now below 14MB).

I used the following git commands for finding the largest files:

git verify-pack -v .git/objects/pack/pack-*.idx | sort -k 3 -g | tail -5
git rev-list --objects --all | grep the_id

Conclusion the largest files are still located in the directory I want to get rid of?

I tried various approaches:

  • Remove large .pack file created by git
  • Git Reduce Repo Size
  • Reduce git repository size
  • How to remove unused objects from a git repository?

but the PACK file stays pretty much the same or becomes even larger (~500MB).

How can I reduce the size of the PACK file and thus my git repository and more particularly remove the files, contained in the directory and its content I removed, from the PACK file?

like image 720
Matthias Avatar asked Oct 29 '17 21:10

Matthias


People also ask

How do I completely delete a file from git history?

The easiest way to delete a file in your Git repository is to execute the “git rm” command and to specify the file to be deleted. Note that by using the “git rm” command, the file will also be deleted from the filesystem.

How do I clean up a git pack?

What you are looking to do is called rewriting history, and it involved the git filter-branch command. This will remove all references to the files from the active history of the repo. Next step, to perform a GC cycle to force all references to the file to be expired and purged from the packfile.

How do I remove a folder from a git repository?

If you're familiar with the terminal window or the DOS prompt, you can easily perform a command line Git repository delete. Just run the rm command with the -f and -r switch to recursively remove the . git folder and all of the files and folders it contains.


1 Answers

You can try BFG Repo-Cleaner and its --delete-folders option:
(do so on a bare cloned repo, copy of your repo for testing)

bfg --delete-folders the_directory --delete-files the_directory  --no-blob-protection my-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

That would by default update your commits and all branches and tags.

like image 65
VonC Avatar answered Nov 15 '22 11:11

VonC