In short, is there a way for me to efficiently (space wise) specify the exact objects I want from a git server that only supports the smart protocol but not the filter-spec?
More context:
For GitHub's lack of filter-spec support in the pack protocol, I've been trying to construct a way to fetch a multi gigabyte repository where a single commit also comprises of multiple gigabytes. My idea was to use fetch pack requests (or upload pack on server) that specify a want
of only a single commit object and from there getting that object, getting the tree it references, getting the tree object in another request, and then manually specifying which blob and tree objects I want from there. What I've discovered though is that the pack protocol seems to operate from the perspective of delivering as much data as it can for a particular commit or tree that you "want".
What this means for what I'm doing is anytime I specify a commit of a tree hash, I get not just the commit or tree object(s) but also every object they contain as well. This also happens while using the deepen settings to limit how many commits I want; 0 yields nothing and 1 yields the aforementioned result. I have verified that specifying a want
of just a blob does result in a pack file with just that blob so that part does work as expected.
git fetch -all fetches all branches of all remotes. git fetch origin fetches all branches of the remote origin .
The git fetch command downloads commits, files, and refs from a remote repository into your local repo. Fetching is what you do when you want to see what everybody else has been working on.
just need to run git fetch , which will retrieve all branches and updates, and after that, run git checkout <branch> which will create a local copy of the branch because all branches are already loaded in your system.
Below is the list with commonly used options when working with git fetch : --all - Fetch all remotes. --append ( -a ) - Appends to existing fetched contents without overwriting. --depth=<depth> - Limit to a specific number of commits starting from the tip of each remote branch history.
What you're requesting isn't possible in the Git protocol unless the filter functionality is enabled.
The Git protocol is and always has been designed to efficiently exchange a set of commits. The way that Git implements the protocol on the server side for fetches is that it marks the client's have
commits as uninteresting and then walks the revisions from what's requested down to the uninteresting points, including all the necessary objects reachable between those points. This approach necessarily requires that the points you're walking be commits.
It is possible to send a request for a tree object, but the server side won't do what you expect. You'll end up with that tree and everything reachable from it (all the blobs and other trees) in the pack, which is going to be significantly more data than you're wanting. Again, this makes perfect sense if you think about how the Git protocol works: the user has requested all of the objects reachable from this point.
You can specify that you have
certain tree objects so as to exclude them, but of course that requires that you know what they are, which in this case you don't. Even so, you'd still receive the blobs that exist within that level of the hierarchy.
The filter functionality just adjusts the objects that are included in the pack, so you can specify that only the one tree object is to be included by excluding everything below its depth. These arguments are passed to git rev-list --objects
so that the pack generation will exclude the things you're not interested in. Otherwise, the default is to include every reachable object within the range you've requested.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With