Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prune binary data from a git repository after the fact

I accidentally committed some large binary data into some commits. Since then I've updated my .gitignore, and those files are no longer being committed. But I'd like to go back into the older commits and selectively prune out this data from the repository, removing a couple directories that should have been in .gitignore. I don't want to remove the commits themselves.

How would I go about accomplishing this? My preferred method would be some way to retroactively apply the .gitignore rules to old commits... an answer that uses this method would also be pretty generally useful to others, since I'm sure my problem is not unique. It would also be quick to apply to a general solution, without lots of customization specific to each user's unique directory structure.

Is this possible, either the easy way I suggest above, or in some more complicated manner?

like image 311
Myrddin Emrys Avatar asked Dec 30 '10 18:12

Myrddin Emrys


People also ask

How do I remove a binary file from a git commit?

What do you do now? InfoQ: The basic way to remove a binary or other improper file from git is to use the git-filter-branch command.

Should binary files be stored in git?

You should use Git LFS if you have large files or binary files to store in Git repositories. That's because Git is decentralized. So, every developer has the full change history on their computer.

Does git compress binary files?

It can, literally, compress (or "deltify") any binary data against any other binary data—but the results will be poor unless the inputs are well-chosen. It's the input choices that are the real key here. Git also has a technical documentation file describing how objects are chosen for deltification.

Which command will you use to check the current status of all your files and git working directory?

The git status command displays the state of the working directory and the staging area. It lets you see which changes have been staged, which haven't, and which files aren't being tracked by Git.


1 Answers

The solution in this answer worked perfectly for me:

You can also test your clean process with a tool like bfg repo cleaner, as in this answer:

java -jar bfg.jar --delete-files *.{jpg,png,mp4,m4v,ogv,webm} ${bare-repo-dir}; 

(Except BFG makes sure it doesn't delete anything in your latest commit, so you need to remove those files in the current index and make a "clean" commit. All other previous commits will be cleaned by BFG)

like image 195
Ed. Avatar answered Oct 06 '22 07:10

Ed.