Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I resume a git history rewrite?

I'm rewriting the history of a fairly big repo using git filter-branch --tree-filter and it's taking a few hours. I see that git is using a temporary directory to store its intermediate work as it goes along. Does that mean it's possible to resume a rewrite if it gets interrupted? If so, how?

Edit

The operation I'm doing is moving a couple of directories. These are currently in subdirectories, but I now need them to be in the root.

e.g.

dir1
- dir2
- dir3
- dir4

becomes

dir1
- dir2
dir3
dir4

Of course my directory structure is a lot more complex than that, but that's the gist of what I'm trying to do.

like image 274
alnorth29 Avatar asked Apr 22 '13 16:04

alnorth29


2 Answers

git filter-branch doesn't itself support a suspend/resume pattern of use - although it writes temporary data out to a .git-rewrite folder, there's no actual support for resuming based on the contents of this directory. If you run git filter-branch on a repository that's had a previously aborted filter-branch operation, it'll either ask you to delete that temp folder, or, with the --force option, do it itself.

The underlying problem is that git-filter-branch is slow running on big repos - if the process was much faster, there'd be no motivation to attempt a resume. So you've got a few options:

Make git-filter-branch go a bit faster...

  • use a RAM-disk - git-filter-branch is very IO-intensive, and will run faster with your repository sitting in RAM.
  • use --index-filter rather than --tree-filter - it's similar to tree filter but doesn't check out the file-tree, which makes it faster, but does require you to rewrite your file alterations in terms of git index commands.
  • use cloud computing and hire a machine with fast ram and high clock-speed (don't bother with multiple cores unless your own commands are multi-threaded, as git-filter-branch itself is single-threaded)

...or use The BFG (way faster)

The BFG Repo-Cleaner is a simpler, faster alternative to git-filter-branch - on large repos it's 50-150x faster. That turns your job that takes several hours into one that takes just a few minutes.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

like image 77
Roberto Tyley Avatar answered Oct 14 '22 03:10

Roberto Tyley


Roberto mentioned this in his answer, but I want to give a benchmark for it: If your git filter-branch operation is taking to long to complete, consider an AWS high memory instance.

I once had to filter-branch and merge together 35 different repositories, each with two years of dozens-of-commits-per-day history. My script failed to complete in 25 hours on my laptop. It completed in 45 minutes on an m2.4xlarge instance in Amazon.

Total cost?

$1.64 -- less than I spend on a 20oz soda.

BFG sounds like a great tool and I'd encourage anyone who routinely rewrites history to try it out. But if you just need something to work and have easy access to AWS, filter-branch is trivially easy.

In 2016 this is even cheaper. Just mosey on over to the Spot Advisor and find yourself something of the "cluster compute for $0.30 / hour variety.

like image 32
Christopher Avatar answered Oct 14 '22 05:10

Christopher