I have a GIT repository on BitBucket which is more than 4GB.
I can't clone the repository using the normal GIT command as it fails (looks like it's working for a long time but then rolls back).
I also can't download the repository as a zip from the BitBucket interface as:
Feature unavailable This repository is too large for us to generate a download.
Is there any way to download a GIT repository incrementally?
To download from GitHub, you should navigate to the top level of the project (SDN in this case) and then a green "Code" download button will be visible on the right. Choose the Download ZIP option from the Code pull-down menu. That ZIP file will contain the entire repository content, including the area you wanted.
Maximum repository size is 10GB The total repository size will be limited to 10GB. You will receive warning messages as your repository size grows to ensure you're aware of approaching any size limits. Eventually, if the repository size exceeds the limit, you will receive an error message and the push will be blocked.
Once you have fetched sufficient number of commits, you can use the following command to fetch all the remaining commits. This way you can clone a large repository. First you need to get a shallow clone of depth 1, then fetch a few more commits using ‘git fetch’ command.
Due to the distributed nature of Git, a repository generally contains all history. That means, even if you have deleted that single 1 GB file in your repository two years ago, it will be transferred on every clone since then. Moreover, Git’s compression algorithms are optimized for source code.
Any old files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data: Once this is done, your repo will be much smaller and should clone without problems.
A large Git repository can be repository with a deep history and a large number of commits. Cloning this kind of repository and/or running certain local Git commands on it (e.g. git blame) can be slow. In some cases, a “shallow” clone, which only clones the most recent commits ( git clone --depth <depth> <url> ), can be a viable solution.
If you don't need to pull the whole history you could specify the number of revisions to clone
git clone <repo_url> --depth=1
Of course this might not help if you have a particularly large file in your repository
For me, helped perfectly, like is described in this answer: https://stackoverflow.com/a/22317479/6332374, but with one little improvement, because of big repo:
At first:
git config --global core.compression 0
then, clone just a part of your repo:
git clone --depth 1 <repo_URI>
and now "the rest"
git fetch --unshallow
but here is the trick.: When you have a big repo sometimes you must perform that step multiple times. So... again,
git fetch --unshallow
and so on.
Try multiple times. Probably you will see, that each time you perform 'unshallow' you get more and more objects before the error.
And at the end, just to be sure.
git pull --all
1) you can initially download the single branch having only the latest commit revision (depth=1), this will significantly reduce the size of the repo to download and still let you work on the code base:
git clone --depth <Number> <repository> --branch <branch name> --single-branch
example:git clone --depth 1 https://github.com/dundermifflin/dwightsecrets.git --branch scranton --single-branch
2) later you can get all the commits (after this your repo will be in the same state as after a git clone):
git fetch --unshallow
or if it's still too much, get only last 25 commits:
git fetch --depth=25
Other way: git clone
is not resumable but you can first git clone
on a third party server and then download the complete repo over http/ftp which is actually resumable.
One potential technique is just to clone a single branch. You can then pull in more later. Do git clone [url_of_remote] --branch [branch_name] --single-branch
.
Large repositories seem to be a major weakness with git. You can read about that at http://www.sitepoint.com/managing-huge-repositories-with-git/. This article mentions a git extension called git-annex that can help with large files. Check it out at https://git-annex.branchable.com/. It helps by allowing git to manage files without checking the files into git. Disclaimer, I've never tried it myself.
Some of the solutions at How do I clone a large Git repository on an unreliable connection? also may help.
EDIT: Since you just want the files you may be able to try git archive
. You'd use syntax something like
git archive --remote=ssh://[email protected]/username/reponame.git --format=tar --output="file.tar" master
I tried to test on a repo at my AWS Codecommit account but it doesn't seem to allow it. Someone on BitBucket may be able to test. Note that on Windows you'd want to use zip rather than tar, and this all has to be done over an ssh connection not https.
Read more about git archive
at http://git-scm.com/docs/git-archive
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With