Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git: What EXACTLY does "git pull" do?

Tags:

git

I know that git pull is actually a combination of git fetch and git merge, and that it basically brings in the repository as it is in the remote repository.

  1. But still, does it mean that after git pull my working tree will be identical to the remote repo?
  2. I found some cases that doing git pull doesn't change anything in my local repo or create any new commit. What is the explanation for this?
  3. Does it make sense that git pull makes changes at the index only?
  4. If it does, how can I make the changes at index move forward to the working tree?
like image 705
CrazySynthax Avatar asked Jul 17 '17 06:07

CrazySynthax


2 Answers

The exactly part is really quite tough. It's often said—and it's mostly true—that git pull runs git fetch followed by either git merge or git rebase, and in fact, git pull, which used to be a shell script and is now a C program, quite literally ran git fetch first, though now it directly invokes the C code that implements git fetch.

The next step, however, is quite tricky. Also, in a comment, you added this:

[fetch] brings changes from the remote repo. Where does it put them?

To understand this properly, you must understand Git's object system.

The Git object model, and git fetch

Each commit is a sort of standalone entity. Every commit has a unique hash ID: b06d364... or whatever. That hash ID is a cryptographic checksum of the contents of that commit. Consider, for instance:

$ git cat-file -p HEAD | sed 's/@/ /g'
tree a15b54eb544033f8c1ad04dd0a5278a59cc36cc9
parent 951ea7656ebb3f30e6c5e941e625a1318ac58298
author Junio C Hamano <gitster pobox.com> 1494339962 +0900
committer Junio C Hamano <gitster pobox.com> 1494339962 +0900

Git 2.13

Signed-off-by: Junio C Hamano <gitster pobox.com>

If you feed these contents (minus the 's/@/ /' part but with the header that Git adds to every object) to a SHA-1 checksum calculator, you will get the hash ID. This means that everyone who has this commit has the same hash ID for it.

You can get the Git repository for Git and run git cat-file -p v2.13.0^{commit} to see this same data. Note: the tag v2.13.0 translates to 074ffb61b4b507b3bde7dcf6006e5660a0430860, which is a tag object; the tag object itself refers to the commit b06d364...:

$ git cat-file -p v2.13.0
object b06d3643105c8758ed019125a4399cb7efdcce2c
type commit
tag v2.13.0
[snip]

To work with a commit, Git must store the commit object—the item with the hash ID b06d364...—itself somewhere, and also its tree object and any additional objects that tree needs. These are the objects that you see Git counting and compressing during a git fetch or git push.

The parent line tells which commit (or, for a merge, commits, plural) are the predecessors of this particular commit. To have a complete set of commits, Git must also have the parent commit(s) (a --shallow clone can deliberately omit various parents, whose IDs are recorded in a special file of "shallow grafts", but a normal clone will always have everything).

There are four types of object in total: commits, (annotated) tags, trees, and what Git calls blob objects. Blobs mostly store the actual files. All of these objects reside in Git's object database. Git can then retrieve them easily by hash ID: git cat-file -p <hash>, for instance, displays them in a vaguely human-readable format. (Most of the time there is little that must be done other than de-compressing, though tree objects have binary data that must be formatted first.)

When you run git fetch—or have git pull run it for you—your Git obtains the hash IDs of some initial objects from another Git, then uses the Git transfer protocols to figure out what additional objects are required to complete your Git repository. If you already have some object, you do not need to fetch it again, and if that object is a commit object, you do not need any of its parents either.1 So you get only the commits (and trees and blobs) that you do not already have. Your Git then stuffs these into your repository's object database.

Once the objects are safely saved away, your Git records the hash IDs in the special FETCH_HEAD file. If your Git is at least 1.8.4, it will also update any corresponding remote-tracking branch names at this time: e.g., it may update your origin/master.

(If you run git fetch manually, your Git obeys all the normal refspec update rules, as described in the git fetch documentation. It's the additional arguments passed to git fetch by git pull that inhibit some of these, depending on your Git version.)

That, then, is the answer to what I think is your real first question: git fetch stores these objects in Git's object database, where they may be retrieved by their hash IDs. It adds the hash IDs to .git/FETCH_HEAD (always), and often also updates some of your references—tag names in refs/tags/, and remote-tracking branch names in refs/remotes/.


1Except, that is, to "unshallow" a shallow clone.


The rest of git pull

Running git fetch gets you objects, but does nothing to incorporate those objects into any of your work. If you wish to use the fetched commits or other data, you need a second step.

The two main actions you can do here are git merge or git rebase. The best way to understand them is to read about them elsewhere (other SO postings, other documentation, and so on). Both are, however, complicated commands—and there is one special case for git pull that is not covered by those two: in particular, you can git pull into a non-existent branch. You have a non-existent branch (which Git also calls an orphan branch or an unborn branch) in two cases:

  • in a new, empty repository (that has no commits), or
  • after running git checkout --orphan newbranch

In both cases, there is no current commit so there is nothing to rebase or merge. However, the index and/or work-tree are not necessarily empty! They are initially empty in a new, empty repository, but by the time you run git pull you could have created files and copied them into the index.

This kind of git pull has traditionally been buggy, so be careful: versions of Git before 1.8-ish will sometimes destroy uncommitted work. I think it's best to avoid git pull entirely here: just run git fetch yourself, and then figure out what you want to do. As far as I know, it's OK in modern Git—these versions will not destroy your index and work-tree—but I am in the habit of avoiding git pull myself.

In any case, even if you are not on an orphan/unborn/non-existent branch, it's not a great idea to try to run git merge with a dirty index and/or work-tree ("uncommitted work"). The git rebase command now has an automatic-stash option (rebase.autoStash), so you can have Git automatically run git stash save to create some off-branch commits out of any such uncommitted work. Then the rebase itself can run, after which Git can automatically apply and drop the stash.

The git merge command does not have this automatic option, but of course you can do it manually.

Note that none of this works if you are in the middle of a conflicted merge. In this state, the index has extra entries: you cannot commit these until you resolve the conflicts, and you cannot even stash them (which follows naturally from the fact that git stash really makes commits). You can run git fetch, at any time, since that just adds new objects to the object database; but you cannot merge or rebase when the index is in this state.

like image 199
torek Avatar answered Oct 07 '22 12:10

torek


  1. But still, does it mean that after "git pull" my working tree will be identical to the remote repo?

Not necessarily. Any local commits you have on the branch you're pulling will be merged with the changes upstream. Use git pull --rebase to put your local changes on top of the upstream commits. You can get some pretty funky merge paths without --rebase.

  1. I found some cases that doing "git pull" doesn't change anything in my local repo or create any new commit?

If there's no new commits upstream, nothing will change in your local copy either.

  1. Does it make sense that "git pull" makes changes at the index only?

Not that I know of. Perhaps if it fails to merge with your local commits, but then you should at least get some errors along the way.

  1. If it does, how can I make the changes at index move forward to the work tree?

git pull :) Or git rebase <upstream> <branchname>. This will rebase the local commits in your branch <branchname> on top of the upstream commits in that branch.

like image 38
harald Avatar answered Oct 07 '22 11:10

harald