How do I download a large Git Repository?

Tags:

bitbucket

I have a GIT repository on BitBucket which is more than 4GB.

I can't clone the repository using the normal GIT command as it fails (looks like it's working for a long time but then rolls back).
I also can't download the repository as a zip from the BitBucket interface as:

Feature unavailable This repository is too large for us to generate a download.

Is there any way to download a GIT repository incrementally?

278

asked Dec 21 '15 05:12

Sebastian Gray

4 Answers

If you don't need to pull the whole history you could specify the number of revisions to clone

git clone <repo_url> --depth=1

Of course this might not help if you have a particularly large file in your repository

answered Oct 22 '22 18:10

Puddler

For me, helped perfectly, like is described in this answer: https://stackoverflow.com/a/22317479/6332374, but with one little improvement, because of big repo:

At first:

git config --global core.compression 0

then, clone just a part of your repo:

git clone --depth 1 <repo_URI>

and now "the rest"

git fetch --unshallow

but here is the trick.: When you have a big repo sometimes you must perform that step multiple times. So... again,

git fetch --unshallow

and so on.

Try multiple times. Probably you will see, that each time you perform 'unshallow' you get more and more objects before the error.

And at the end, just to be sure.

git pull --all

answered Oct 22 '22 19:10

Mateusz Sęczkowski

1) you can initially download the single branch having only the latest commit revision (depth=1), this will significantly reduce the size of the repo to download and still let you work on the code base:

git clone --depth <Number> <repository> --branch <branch name> --single-branch

example:
git clone --depth 1 https://github.com/dundermifflin/dwightsecrets.git --branch scranton --single-branch

2) later you can get all the commits (after this your repo will be in the same state as after a git clone):

git fetch --unshallow

or if it's still too much, get only last 25 commits:

git fetch --depth=25

Other way: git clone is not resumable but you can first git clone on a third party server and then download the complete repo over http/ftp which is actually resumable.

answered Oct 22 '22 17:10

GorvGoyl

One potential technique is just to clone a single branch. You can then pull in more later. Do git clone [url_of_remote] --branch [branch_name] --single-branch.

Large repositories seem to be a major weakness with git. You can read about that at http://www.sitepoint.com/managing-huge-repositories-with-git/. This article mentions a git extension called git-annex that can help with large files. Check it out at https://git-annex.branchable.com/. It helps by allowing git to manage files without checking the files into git. Disclaimer, I've never tried it myself.

Some of the solutions at How do I clone a large Git repository on an unreliable connection? also may help.

EDIT: Since you just want the files you may be able to try git archive. You'd use syntax something like

git archive --remote=ssh://[email protected]/username/reponame.git --format=tar --output="file.tar" master

I tried to test on a repo at my AWS Codecommit account but it doesn't seem to allow it. Someone on BitBucket may be able to test. Note that on Windows you'd want to use zip rather than tar, and this all has to be done over an ssh connection not https.

Read more about git archive at http://git-scm.com/docs/git-archive

answered Oct 22 '22 17:10

James Jones

Related questions
                            
                                Does git support wildcards in paths?
                            
                                Git SVN Is Unable to Fetch from SVN Repository
                            
                                Any difference between git add . and git add --all?
                            
                                Why does git commit --amend change the hash even if I don't make any changes?
                            
                                Stage file by its file name, regardless of directory--Git
                            
                                How to work with multiple ssh keys [duplicate]
                            
                                What can/should I do about this git gc error? (rm: cannot unlink pack Permission denied)
                            
                                Why doesn't git allow me to safely delete a branch?
                            
                                How to do a pull request in GitHub with only the latest commit in the master branch of my forked repository
                            
                                Stopping a git gc --aggressive, is that a bad thing?
                            
                                stop git merge from opening text editor
                            
                                git svn cherry pick ignored warning
                            
                                git pull --rebase --preserve-merges
                            
                                How do you see / show a git merge conflict resolution that was done, given a merge commit SHA1?
                            
                                how to detach alternates after git clone --reference?
                            
                                View diff of staged changes in git [duplicate]
                            
                                Import an existing Git project into Eclipse without a .project file
                            
                                Git untracked files list is wrong
                            
                                Cloning only a subdirectory with git [duplicate]
                            
                                Git - remote: error: cannot run hooks/post-receive: No such file or directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With