Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I clone a git repo that has become too large?

Tags:

git

I am working with a git repo that is very large ( > 10gb ). The repo itself has many large binary files, with many versions of each ( > 100mb ). The reasons for this are beyond the scope of this question.

Currently, it is no longer possible to properly clone from the repo, as the server itself will run out of memory (it has 12gb) and send a fail code. I would paste it here, but it takes well over an hour to get to the point of failure.

Are there any methods by which I can make a clone succeed? Even one which grabs a partial copy of the repo? Or a way I can clone in bite sized chunks that won't make the server choke?

like image 494
Charles Randall Avatar asked Sep 17 '13 13:09

Charles Randall


People also ask

How big is too big for a git repo?

The total repository size will be limited to 10GB. You will receive warning messages as your repository size grows to ensure you're aware of approaching any size limits. Eventually, if the repository size exceeds the limit, you will receive an error message and the push will be blocked.

How do I compress a git repository?

remove the file from your project's current file-tree. remove the file from repository history — rewriting Git history, deleting the file from all commits containing it. remove all reflog history that refers to the old commit history. repack the repository, garbage-collecting the now-unused data using git gc.


2 Answers

One answer to 'How do I clone a git repo that has become too large?' is 'Reduce it's size, removing the Big Blobs'.

(I must conceed that the asker clarifies in a comment that repo-fixing is 'beyond the scope of this question', however the comment also says 'I am working for a quick fix to allow me to clone the repo right now', so I'm posting this answer because a) it's possible they're not aware of The BFG and so overestimate the difficult of cleaning a repo, and b) it is indeed, very freakin' quick.

To clean the repo easily and quickly, use The BFG:

$ java -jar bfg.jar  --strip-blobs-bigger-than 100M  my-repo.git

Any old files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

Once this is done, your repo will be much smaller and should clone without problems.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

like image 78
Roberto Tyley Avatar answered Oct 03 '22 23:10

Roberto Tyley


You can try passing --depth option to git clone. Or you can copy it using rsync or some such?

like image 20
Michael Krelin - hacker Avatar answered Oct 04 '22 01:10

Michael Krelin - hacker