I'm rewriting the history of a fairly big repo using git filter-branch --tree-filter
and it's taking a few hours. I see that git is using a temporary directory to store its intermediate work as it goes along. Does that mean it's possible to resume a rewrite if it gets interrupted? If so, how?
Edit
The operation I'm doing is moving a couple of directories. These are currently in subdirectories, but I now need them to be in the root.
e.g.
dir1
- dir2
- dir3
- dir4
becomes
dir1
- dir2
dir3
dir4
Of course my directory structure is a lot more complex than that, but that's the gist of what I'm trying to do.
git filter-branch
doesn't itself support a suspend/resume pattern of use - although it writes temporary data out to a .git-rewrite
folder, there's no actual support for resuming based on the contents of this directory. If you run git filter-branch
on a repository that's had a previously aborted filter-branch
operation, it'll either ask you to delete that temp folder, or, with the --force
option, do it itself.
The underlying problem is that git-filter-branch
is slow running on big repos - if the process was much faster, there'd be no motivation to attempt a resume. So you've got a few options:
git-filter-branch
is very IO-intensive, and will run faster with your repository sitting in RAM.--index-filter
rather than --tree-filter
- it's similar to tree filter but doesn't check out the file-tree, which makes it faster, but does require you to rewrite your file alterations in terms of git index commands.git-filter-branch
itself is single-threaded)The BFG Repo-Cleaner is a simpler, faster alternative to git-filter-branch
- on large repos it's 50-150x faster. That turns your job that takes several hours into one that takes just a few minutes.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
Roberto mentioned this in his answer, but I want to give a benchmark for it: If your git filter-branch
operation is taking to long to complete, consider an AWS high memory instance.
I once had to filter-branch
and merge together 35 different repositories, each with two years of dozens-of-commits-per-day history. My script failed to complete in 25 hours on my laptop. It completed in 45 minutes on an m2.4xlarge
instance in Amazon.
Total cost?
$1.64 -- less than I spend on a 20oz soda.
BFG sounds like a great tool and I'd encourage anyone who routinely rewrites history to try it out. But if you just need something to work and have easy access to AWS, filter-branch
is trivially easy.
In 2016 this is even cheaper. Just mosey on over to the Spot Advisor and find yourself something of the "cluster compute for $0.30 / hour variety.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With