Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove file from git repository (history)

(solved, see bottom of the question body)
Looking for this for a long time now, what I have till now is:

  • http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ and
  • http://progit.org/book/ch9-7.html

Pretty much the same method, but both of them leave objects in pack files... Stuck.
What I tried:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_name' rm -Rf .git/refs/original rm -Rf .git/logs/ git gc 

Still have files in the pack, and this is how I know it:

git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3 

And this:

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch file_name" HEAD rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune 

The same...

Tried git clone trick, it removed some of the files (~3000 of them) but the largest files are still there...

I have some large legacy files in the repository, ~200M, and I really don't want them there... And I don't want to reset the repository to 0 :(

SOLUTION: This is the shortest way to get rid of the files:

  1. check .git/packed-refs - my problem was that I had there a refs/remotes/origin/master line for a remote repository, delete it, otherwise git won't remove those files
  2. (optional) git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5 - to check for the largest files
  3. (optional) git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98 - to check what are those files
  4. git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names' - to remove a file from all revisions
  5. rm -rf .git/refs/original/ - to remove git's backup
  6. git reflog expire --all --expire='0 days' - to expire all the loose objects
  7. git fsck --full --unreachable - to check if there are any loose objects
  8. git repack -A -d - repacking
  9. git prune - to finally remove those objects
like image 962
Boris Churzin Avatar asked Jan 29 '10 19:01

Boris Churzin


People also ask

Can you remove a file from git history?

If you commit sensitive data, such as a password or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository's history you can use either the git filter-repo tool or the BFG Repo-Cleaner open source tool.

How do I remove files from git log?

Deleting an entire directory from Git commit history At this step, you can run the git commit command followed by the push command to push the removal up to the remote repository in GitHub as shown in deleting a file in the previous section.


2 Answers

I can't say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran git filter-branch. This would explain why git fsck --full --unreachable doesn't call the large blob an unreachable object, even though you've expired your reflog and removed the original (unpacked) refs.

Here's what I'd do (after git filter-branch and git gc have been done):

1) Make sure original refs are gone:

rm -rf .git/refs/original

2) Expire all reflog entries:

git reflog expire --all --expire='0 days'

3) Check for old packed refs

This could potentially be tricky, depending on how many packed refs you have. I don't know of any Git commands that automate this, so I think you'll have to do this manually. Make a backup of .git/packed-refs. Now edit .git/packed-refs. Check for old refs (in particular, see if it packed any of the refs from .git/refs/original). If you find any old ones that don't need to be there, delete them (remove the line for that ref).

After you finish cleaning up the packed-refs file, see if git fsck notices the unreachable objects:

git fsck --full --unreachable

If that worked, and git fsck now reports your large blob as unreachable, you can move on to the next step.

4) Repack your packed archive(s)

git repack -A -d

This will ensure that the unreachable objects get unpacked and stay unpacked.

5) Prune loose (unreachable) objects

git prune

And that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don't know about. In the absence of a better way, manual editing of the packed-refs file might be the only way to go.

like image 155
Dan Moulding Avatar answered Sep 28 '22 10:09

Dan Moulding


I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for rewriting files from Git history. One way in which it makes your life easier here is that it actually handles all references by default (all tags, branches, stuff like refs/remotes/origin/master, etc) but it's also 10-50x faster.

You should carefully follow these steps here: http://rtyley.github.com/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 6 or above) and run this command:

$ java -jar bfg.jar  --delete-files file_name  my-repo.git 

Any file named file_name (that isn't in your latest commit) will be will be totally removed from your repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive 

The BFG is generally much simpler to use than git-filter-branch - the options are tailored around these two common use-cases:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.

like image 22
Roberto Tyley Avatar answered Sep 28 '22 11:09

Roberto Tyley