Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetch/Pull Part of Very Large Repository?

Tags:

git

This is probably obvious and has been asked many times in different ways before, but I have not been able to find the answer after searching for some time.

Assume the following:

  • I have, say, a 500GB disk at the local end;
  • I have a 100 terabyte remote repository; therefore, the cost of cloning the entire repository is simply not feasible;
  • the working directory used to create the remote repository was composed of 1000 top level directories DIR001, DIR002, ... DIR00N each containing multiple subdirectories with files only under the leaf subdirectories (Ex. DIR001/subdir1/fileA1 ... DIR001/subf1/fileAN and DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN, ...
  • I did NOT explicitly tag or branch directories DIR001, DIR002, ... DIR00N or anything else for that matter
  • I init a brand new local git repository

How do I efficiently pull or fetch the last committed versions of, say, DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

just the last committed version of a single file from DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

How do I efficiently pull or fetch a previously committed version of a subset of said files and nothing else?

Maybe fetch/pull is not the correct command for this.

like image 563
Gregg Leichtman Avatar asked Sep 09 '10 11:09

Gregg Leichtman


1 Answers

The answer to "Partial cloning" can help you start experimenting with shallow clones.
But it will be limited:

  • to a certain depth, and/or to certain branches,
  • but not to certain files or directories (you can get a file or directory though sparse checkout, but you still have to get the full repo first!)
  • Even a certain commit.
    (Git 2.5 (Q2 2015) supports a single fetch commit! See "Pull a specific commit from a remote git repository").

The real solution would be to separate the huge remote repo into submodules though.
See What are Git limits or Git style backup of binary files for illustrating this kind of situation.


Update April 2015:

Git Large File Storage (LFS) would make pull/fetch much more efficient (by GitHub, April 2015).

The project is git-lfs (see git-lfs.github.com) and tested with server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif

like image 102
VonC Avatar answered Sep 21 '22 12:09

VonC