Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If a git fetch is cancelled half way will it resume?

Tags:

git

resume

Sometimes fetching any git repository ( by executing the "git fetch repository_URL") might take many hours depending the size of repository and network speed.

If for some reason the user cancels the fetch mid way and then tries to fetch the same repository later on, in exactly same environment where he / she cancelled the last fetch, how does the fetch is going to work?

Will it resume the fetch where it left off?

like image 968
Dinesh Maurya Avatar asked Mar 22 '15 08:03

Dinesh Maurya


Video Answer


1 Answers

No (2015) or maybe soon (Q4 2018), git clone/fetch/pull operations don't have a "resume" capability.

Since then:

  • Q4 2018, Git 2.18 and 2.19 introduce a Wire v2 protocol.
  • GitLab will support it starting Oct. 2018.

2015:

The only alternative, mentioned in this thread, is gitolite (which is a perl script managing ACM -- Access Control Level for your repo, as well as providing other utilities around git access)

gitolite can be configured to update a "Git bundle" (see the git-bundle manual) which is then can be made downloadable via rsync or HTTP protocols and then it can be downloaded using an rsync client of a HTTP client which supports resuming.

Using this technique can make the "download everything" and "make a repo out of the downloaded stuff" steps distinct, and the first step can be carried out using any number of attempts.

The downsides are obvious:

  1. This requires special setup on the server side.
  2. It's unclear what happens if someone manages to update a repository while someone is downloading its bundle, or the update happens between the adjacent download attempts.

Regarding the resumable feature of git clone/fetch (mentioned in "How to complete a git clone for a big project on an unstable connection?"), there is a recent discussion (March 2016) on the git mailing list.

  • One approach consists for the server to produce bundles, that can be loaded (with resume using wget -c!) and added to the local repo (since a bundle is one file you can clone from, as if it was a git repo).
    See "Cloning Linux from a bundle"

That is:

wget -c https://cdn.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/clone.bundle
git bundle verify clone.bundle
...
clone.bundle is okay
git clone clone.bundle linux
#Now, point the origin to the live git repository and get the latest changes:
cd linux
git remote remove origin
git remote add origin https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git pull origin master
  • The other approach is an actual implementation of a resumable git clone,discussed in this thread

We could implement resumable clone by making a bit of a hybrid of the smart and dumb HTTP protocol.

  1. A git clone eventually calls into the transport layer, and git-remote-curl will probe for the info/clone URL; if the resource fails to load, everything goes through the traditional codepath.

  2. When git-remote-curl detects the support of dumb clone, it does the "retry until successfully download the pack data fully" dance internally, tentatively updates the remote tracking refs, and then pretends as if it was asked to do an incremental fetch. If this succeeds without any die(), everybody is happy.

  3. If the above step 3. has to die() for some reason (including impatience hitting CTRLC), leave the $GIT_DIR, downloaded .info file and partially downloaded .pack file.
    Tell the user that the cloning can be resumed and how.

Note that is is for a resumable clone, not a resumable fetch:

the initial "clone" and subsequent incremental "fetch" are orthogonal issues.

Because the proposed update to "clone" has much larger payoff than the proposed change to "fetch", i.e.

  • The amount of data that gets transferred is much larger, hence the chance of network timing out in a poor network environment is much higher, need for resuming much larger.
  • Not only the approach makes "clone" resumable and helping clients, it helps the server offload bulk transfer out to CDN.

and it has much smaller damage to the existing code, i.e.

  • We do not have to pessimize the packing process, only to discard the bulk of bytes that were generated, like the proposed approach for "fetch".
  • The area new code is needed is well isolated and the switch to new protocol happens very early in the exchange without sharing code to existing codepath; these properties make it less risky to introduce regression.

To avoid an HTTP-only feature in the new protocol, there is proposal for the "v2" protocol that would let the two sides exchange capabilities before the ref advertisement. Then the client, having seen the server's resumable URL, knows whether or not to proceed with the advertisement.
See stefanbeller/gitprotocol2-10 in July 2017.

like image 126
VonC Avatar answered Oct 19 '22 02:10

VonC