Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is git Fetch data stored?

Tags:

git

I'm new to git and I'm trying to understand the concepts of it.

refspec - as far as I understood maps the server branches to the "local" destination branches.

I wrote "local" because if we do 'git fetch' they do not merge with our working copy or staging area they are kept elsewhere.

So my questions are

A) Are they stored at .git/refs/remotes/origin/(branches)?

B) If yes is the content of each .git/refs/remotes/origin/(branch) a reference/ID to the place where the changes are in the server?

In one of the branches in .git/refs/remotes/origin/ I find this: 761db53af177ecfd9c9b14511360537e041ebed7

C) When we do a commit where are the changes save?

are they stored in the .git/objects ?

like image 518
Nelssen Avatar asked Jul 02 '16 21:07

Nelssen


Video Answer


1 Answers

These are answered elsewhere, piecemeal, but I'll go ahead and put in an short1 all-in-one answer here:

refspec - as far as I understood maps the server branches to the "local" destination branches

This is an effect, not a definition.

The definition of a refspec is simply a pair of reference names, separated by a colon :, and with an optional leading plus sign +. The reference name on the left of the colon is the source and the name on the right is the destination. If the leading plus sign is present, the refspec says that Git should do a "forced update", i.e., update a reference even if it's not a fast-forward (branch references) or is naturally disallowed (for tags).

Refspecs are used for both git fetch and git push. Omitting one of the parts has different effects in the different commands, so it's easiest to work with refspecs that are complete on both sides. When used with git fetch, the source is indeed the server's reference (usually a branch-name, but you may copy tags, notes, or any other reference the server exposes—and by default, servers expose everything). The destination is, as you surmised, your own local reference, likewise usually a branch-name.

Your first question:

A) Are they stored at .git/refs/remotes/origin/(branches)?

By default, yes. But note several caveats:

  • To find the .git directory, use git rev-parse --show-toplevel or git rev-parse --show-cdup in case you are in a sub-directory.
  • Reference names may be packed, in which case they are not stored in their own individual files.
  • Git is (internally) evolving a pluggable back-end for ref-names, so even the above is subject to change: to resolve or update ref-names from scripts, you should use Git's "plumbing" commands, namely git rev-parse, git symbolic-ref, and git update-ref.

The "by default" part of the qualifier above is because the destination name is actually a configuration item. For the remote named origin, git fetch defaults to using the configured remote.origin.fetch value:

$ cd [path to git repo for git]
$ git config --get-all remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*
$ cd [path to git repo for FreeBSD]
$ git config --get-all remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*
+refs/notes/*:refs/notes/origin/*
$ 

As the above shows, you can have more than one fetch configuration for any given remote.

(For a remote named $rem, use git config --get-all remote.$rem.fetch to see its default fetch refspecs.)

As the above also shows, fetch refspecs may contain wildcard * characters. These work much like shell (sh, bash, zsh, etc) "glob" characters, except that in older versions of Git2 they are restricted to appearing immediately after a slash / (e.g., refs/heads/pr* would be forbidden), and of course when they appear on the destination side they just expand back to whatever they matched on the source side, so they must be paired up.

B) If yes is the content of each .git/refs/remotes/origin/(branch) a reference/ID to the place where the changes are in the server?

Sort of: it is an object identifier. (The exact form of this question implies a false assumption, which we'll get to in a moment.)

Because refs/remotes/origin/$branch is a remote-tracking branch name (and is therefore a copy of a refs/heads/ name-space branch name), it must in fact point to3 a commit object. Note that the location of the name, in refs/, determines the type of reference. It's the type-of-reference that implies the object type: most names generally should point to commit objects, but tag names may point to any of the four basic object types. (The four types are commit, tree, blob, and tag or "annotated tag"; tags names usually should point directly to commits, or to annotated tag objects. A commit points to parent commits and to one tree; trees point to blobs and additional trees; and blobs contain file contents. See also the answer to question C.)

C) When we do a commit where are the changes save?

This is where you have gone off the rails, into the weeds: Git saves snapshots, not changes. But you are mostly in the right area!

A picture is worth many words, so I'll include a link to the Pro Git book chapter.

are they stored in the .git/objects ?

This is where Git stores its "object files", yes. A new commit means a new object, and the new object goes into that directory (the actual file name is complextificationified4 to make Git run faster on Linux boxes, where "fat" directories are slow). As noted above, the new commit points to a tree—usually a new tree object, rather than some existing tree object—and the tree provides the mapping from sane user file names (and file modes, specifically the execute bit) to Git blob-object names. The tree and blob objects are also stored in .git/objects.

Complicating this, however, objects may be packed into a "pack file". (At this point, Git does start to do delta-compression, much like other Version Control Systems, but with the remarkable twist that the delta compression need not be applied against a previous version of "the same" file, or even of a file at all. In practice, the heuristic for choosing delta chains currently associates by file names and sizes, but in principle, we could compress a file against a tree, or vice versa, if that worked well.)


1Short as in "time to write", not "length of answer". :-) If I took longer I could write a shorter answer...

2Somewhere around Git 2.5 or so, but I don't have time to look up the exact version at which the restrictions on wildcard placement were relaxed.

3The phrase "point to" here is short for "contain the ID of", with the implication that we—or Git—can follow this ID to find the object itself, as if following a pointing arrow: this way to the egress object.

4Apologies to non-native-English speakers: this is not a real word, but rather a portmanteau of "complex" and "complication" with the adverbial suffix "-ify" added twice with glue, filing, and sanding to fit, just to complicate the complications. :-)

like image 93
torek Avatar answered Sep 22 '22 14:09

torek