I wanted to remove a directory and its contents from the history of a git repository to reduce the size of this git repository. (The directory contained binary assets such as models and textures and contributed by far the most to the size of the git repository.)
I used the following solution to a previous question:
git filter-branch --tree-filter 'rm -rf the_directory' --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo the_directory/ >> .gitignore
git add .gitignore
git commit -m 'Removing the_directory from git history'
git gc
git push origin master --force
This seemed to have worked because I cannot find any references anymore to this directory and its content in my commit history on Github. (I have above 1500 commits and the directory was always there but isn't anymore. I even cannot find the commit anymore in which I explicitly deleted the directory (from the repository but not from the history).)
Unfortunately, the size of the repository was not changed according to Github. I still have a PACK
file of 450MB (while the actual repository is now below 14MB).
I used the following git
commands for finding the largest files:
git verify-pack -v .git/objects/pack/pack-*.idx | sort -k 3 -g | tail -5
git rev-list --objects --all | grep the_id
Conclusion the largest files are still located in the directory I want to get rid of?
I tried various approaches:
but the PACK
file stays pretty much the same or becomes even larger (~500MB).
How can I reduce the size of the PACK
file and thus my git repository and more particularly remove the files, contained in the directory and its content I removed, from the PACK
file?
The easiest way to delete a file in your Git repository is to execute the “git rm” command and to specify the file to be deleted. Note that by using the “git rm” command, the file will also be deleted from the filesystem.
What you are looking to do is called rewriting history, and it involved the git filter-branch command. This will remove all references to the files from the active history of the repo. Next step, to perform a GC cycle to force all references to the file to be expired and purged from the packfile.
If you're familiar with the terminal window or the DOS prompt, you can easily perform a command line Git repository delete. Just run the rm command with the -f and -r switch to recursively remove the . git folder and all of the files and folders it contains.
You can try BFG Repo-Cleaner and its --delete-folders
option:
(do so on a bare cloned repo, copy of your repo for testing)
bfg --delete-folders the_directory --delete-files the_directory --no-blob-protection my-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
That would by default update your commits and all branches and tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With