Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between git clone --shared and --reference?

Tags:

git

People also ask

What is the difference between git remote and git clone?

They are two completely different things. git remote is used to refer to a remote repository or your central repository. git clone is used to copy or clone a different repository.

What is the difference between git clone and download?

When you download the repo it just gives you all the source files with no . git so you dont have the repo. When you clone you get a copy of the history and it is a functional git repo.

What does git clone do?

git clone is primarily used to point to an existing repo and make a clone or copy of that repo at in a new directory, at another location. The original repository can be located on the local filesystem or on remote machine accessible supported protocols. The git clone command copies an existing Git repository.

Is git clone same as copy?

Cloning a repository gives you a copy of that repository and configures the original repository as a remote. Copying a repository just gives you a copy of that repository. (Though you can of course just add the remote definition afterwards via git remote add .) Copying a repository copies its .


Both options update .git/objects/info/alternates to point to the source repository, which could be dangerous hence the warning note is present on both options in documentation.

The --shared option does not copy the objects into the clone. This is the main difference.

The --reference uses an additional repository parameter. Using --reference still copies the objects into destination during the clone, however you are specifying objects be copied from an existing source when they are already available in the reference repository. This can reduce network time and IO from the source repository by passing the path to a repository on a faster/local device using --reference

See for yourself

Create a --shared clone and a --reference clone. Count the objects in each using git count-objects -v. You'll notice the shared clone has no objects, and the reference clone has the same number of objects as the source. Further, notice the size difference of each in your file system. If you were to move the source, and test git log in both shared and reference repositories, the log is unavailable in the shared clone, but works fine in the reference clone.


The link in the comments to your question is really a clearer answer: --reference implies --shared. The point of --reference is to optimise network I/O during the initial clone of a remote repository.

Contrary to the answer above, I find that the --shared and --reference repositories -- from the same source -- have the same size and both have zero objects. Of course, if you use --reference for some other repository which is based off a common source, the size and objects will reflect the difference between the repositories. Note that in both cases we are not saving space in the work tree, only the .git/objects.

There is some nuance to maintaining this setup going forward - read the thread for more details. Essentially it sounds like the two should be treated as public repositories, with care around history re-writing in the presence of repacking/pruning/garbage collection.

The workflow around maintaining an optimal disk-space usage after the initial clone seems to be:

  1. pull source
  2. repack source
  3. pull secondary
  4. git gc in secondary

Probably best to read the discussion in that thread though.

You can add an alternate to an existing repository by putting the absolute path to the source's objects directory into secondary/.git/objects/info/alternates and running git gc (many people use git repack -a -d -l, which is done by git gc).

You can remove an alternate by running git repack -a -d (no -l) in the secondary and then removing the line from the alternates file. As described in the thread, it is possible to have more than one alternate.

I've not used this much myself, so I don't know how error-prone it is to manage.


The link in the comments to your question is now dead.

https://www.oreilly.com/library/view/git-pocket-guide/9781449327507/ch06.html has some great information on the subject. Here is some of what is there:

first, we make a bare clone of the remote repository, to be shared locally as a reference repository (hence named “refrep”):
$ git clone --bare http://foo/bar.git refrep

Then, we clone the remote again, but this time giving refrep as a reference:
$ git clone --reference refrep http://foo/bar.git

The key difference between this and the --shared option is that you are still tracking the remote repository, not the refrep clone. When you pull, you still contact http://foo/, but you don’t need to wait for it to send any objects that are already stored locally in refrep; when you push, you are updating the branches and other refs of the foo repository directly.

Of course, as soon as you and others start pushing new commits, the reference repository will become out of date, and you’ll start to lose some of the benefit. Periodically, you can run git fetch --all in refrep to pull in any new objects. A single reference repository can be a cache for the objects of any number of others; just add them as remotes in the reference:

$ git remote add zeus http://olympus/zeus.git
$ git fetch --all zeus