Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does GIT PUSH do exactly?

Tags:

git

git-push

I can't seem to find a good explanation of this.

I know what git pull does:

1) a fetch, i.e. all the extra commits from the server are copied into the local repo and the origin/master branch pointer moves to the end of the commit chain

2) a merge of the origin/master branch into the master branch, the master branch pointer moving to the newly created commit, while the origin/master pointer staying put.

I assume git push does something very similar, but I don't know for sure. I believe it does one of these, or something similar, or something else (?):

  • copies all local commits and makes a merge there (the reverse of what git pull does); but in this case, the server does not have my local master branch, so I can't see what is it merging

OR

  • merges my master branch into the origin/master, pushing the resulting commit to the server and linking it next to the existing end-commit, also moving the server's master; this doesn't seem right because then my local origin/master is not in sync with the server's.

I'm currently using git for basic operations so I'm doing fine, but I want to fully understand these internals.

like image 665
Bogdan Alexandru Avatar asked Sep 23 '14 21:09

Bogdan Alexandru


People also ask

What means push in git?

The git push command allows you to send (or push) the commits from your local branch in your local Git repository to the remote repository. To be able to push to your remote repository, you must ensure that all your changes to the local repository are committed.

Where does git push to?

Git push will copy all commits from current branch that are missing in destination branch (a38de, 893cf, 756ae) and move the pointers both in destination branch and remote tracking branch to the same commit in local branch. Note that it will not perform any merge. Push will get rejected if it fails.


2 Answers

Assuming you already understand git's "objects" model (your commits and files and so on are all just "objects in the git database", with "loose" objects—those not packed up to save space—stored in .git/objects/12/34567... and the like)...

You are correct: git fetch retrieves objects "they" (origin, in this case) have that you don't, and sticks labels on them: origin/master and the like. More specifically, your git calls up theirs on the Internet-phone (or any other suitable transport) and asks: what branches do you have, and what commit IDs are those? They have master and the ID is 1234567..., so your git asks for 1234567... and any other objects needed that you don't already have, and makes your origin/master point to commit object 1234567....

The part of git push that is symmetric here is this: your git calls up their git on the same Internet-phone as usual, but this time, instead of just asking them about their branches, your git tells them about your branches and your git repository objects, and then says: "How about I get you to set your master to 56789ab...?"

Their git takes a look at the objects you sent over (the new commit 56789ab... and whatever other objects you have that they didn't, that they would need to take it). Their git then considers the request to set their master to 56789ab....

As Chris K already answered, there is no merging happening here: your git simply proposes that their git overwrite their master with this new commit-ID. It's up to their git to decide whether to allow that.

If "they" (whoever they are) have not set up any special rules, the default rule that git uses here is very simple: the overwrite is allowed if the change is a "fast forward". It has one additional feature: the overwrite is also allowed if the change is done with the "force" flag set. It's usually not a good idea to set the force flag here, as the default rule, "only fast forwards", is usually the right rule.

The obvious question here is: what exactly is a fast forward? We'll get to that in a moment; first I need to expand a bit on labels, or "references" to be more formal.

Git's references

In git, a branch, or a tag, or even things like the stash and HEAD are all references. Most of them are found in .git/refs/, a sub-directory of the git repository. (A few top-level references, including HEAD, are right in .git itself.) All a reference is, is a file1 containing an SHA-1 ID like 7452b4b5786778d5d87f5c90a94fab8936502e20. SHA-1 IDs are cumbersome and impossible for people to remember, so we use names, like v2.1.0 (a tag in this case, version 2.1.0 of git itself) to save them for us.

Some references are—or at least are intended to be—totally static. The tag v2.1.0 should never refer to something other than the SHA-1 ID above. But some references are more dynamic. Specifically, your own local branches, like master, are moving targets. One special case, HEAD, is not even a target of its own: it generally contains the name of the moving-target branch. So there's one exception for "indirect" references: HEAD usually contains the string ref: refs/heads/master, or ref: refs/heads/branch, or something along those lines; and git does not (and cannot) enforce a "never change" rule for references. Branches in particular change a lot.

How do you know if a reference is supposed to change? Well, a lot of this is just by convention: branches move and tags don't. But you should then ask: how do you know if a reference is a branch, or a tag, or what?

Name spaces of references: refs/heads/, refs/tags/, etc.

Other than the special top-level references, all of git's references are in refs/ as we already noted above. Within the refs/ directory (or "folder" if you're on Windows or Mac), though, we can have a whole collection of sub-directories. Git has, at this point, four well-defined subdirectories: refs/heads/ contains all your branches, refs/tags/ contains all your tags, refs/remotes/ contains all your "remote-tracking branches", and refs/notes/ contains git's "notes" (which I will ignore here as they get a bit complicated).

Since all your branches are in refs/heads/, git can tell that these should be allowed to change, and since all your tags are in refs/tags/, git can tell that these should not.

Automatic motion of branches

When you make a new commit, and are on a branch like master, git will automatically move the reference. Your new commit is created with its "parent commit" being the previous branch-tip, and once your new commit is safely saved away, git changes master to contain the ID of the new commit. In other words, it makes sure that the branch name, the reference in the heads sub-directory, always points to the tip-most commit.

(In fact, the branch, in the sense of a collection of commits that is part of the commit-graph stored in the repository, is a data structure made out of the commits in the repository. Its only connection with the branch name is that the tip commit of the branch itself is stored in the reference label with that name. This is important later, if and when branch names are changed or erased as the repository grows many more commits. For now it's just something to keep in mind: there's a difference between the "branch tip", which is where the "branch name" points, and the branch-as-a-subset-of-commit-DAG. It's a bit unfortunate that git tends to lump these different concepts under a single name, "branch".)

What exactly is a fast forward?

Usually you see "fast forward" in the context of merge, often with the merge done as the second step in a git pull. But in fact, "fast forwarding" is actually a property of a label move.

Let's draw a little bit of a commit graph. The little o nodes represent commits, and each one has an arrow pointing left, left-and-up, or left-and-down (or in one case, two arrows) to its parent (or parents). To be able to refer to three by name I'll give them uppercase letter names instead of o. Also, this character-based artwork doesn't have arrows, so you have to imagine them; just remember that they all point left or left-ish, just like the three names.

            o - A   <-- name1           / o - o - o - o - B   <-- name2       \       /         o - C       <-- name3 

When you ask git to change a reference, you simply ask it to stick a new commit ID into the label. In this case, these labels live in refs/heads/ and are thus branch names, so they are supposed to be able to take on new values.

If we tell git to put B into name1, we get this:

            o - A           / o - o - o - o - B   <-- name1, name2       \       /         o - C       <-- name3 

Note that commit A now has no name, and the o to the left of it is found only by finding A ... which is hard since A has no name. Commit A has been abandoned, and these two commits have become eligible for "garbage collection". (In git, there's a "ghost name" left behind in the "reflog", that keeps the branch with A around for 30 days in general. But that's a different topic entirely.)

What about telling git to put B into name3? If we do that next, we get this:

            o - A           / o - o - o - o - B   <-- name1, name2, name3       \       /         o - C 

Here, commit C still has a way to find it: start at B and work down-and-left, to its other (second) parent commit, and you find commit C. So commit C is not abandoned.

Updating name1 like this is not a fast-forward, but updating name3 is.

More specifically, a reference-change is a "fast forward" if and only if the object—usually a commit—that the reference used to point-to is still reachable by starting from the new place and working backwards, along all possible backwards paths. In graph terms, it's a fast-forward if the old node is an ancestor of the new one.

Making a push be a fast-forward, by merging

Branch-name fast-forwards occur when the only thing you do is add new commits; but also when, if you've added new commits, you've also merged-in whatever new commits someone else added. That is, suppose your repo has this in it, after you've made one new commit:

             o   <-- master            / ...- o - o       <-- origin/master 

At this point, moving origin/master "up and right" would be a fast-forward. However, someone else comes along and updates the other (origin) repo, so you do a git fetch and get a new commit from them. Your git moves your origin/master label (in a fast-forward operation on your repo, as it happens):

             o   <-- master            / ...- o - o - o   <-- origin/master 

At this point, moving origin/master to master would not be a fast-forward, as it would abandon that one new commit.

You, however, can do a git merge origin/master operation to make a new commit on your master, with two parent commit IDs. Let's label this one M (for merge):

             o - M  <-- master            /   / ...- o - o - o   <-- origin/master 

You can now git push this back to origin and ask them to set their master—which you are calling origin/master—equal to your (new) M, because for them, that's now a fast-forward operation!

Note that you can also do a git rebase, but let's leave that for a different stackoverflow posting. :-)


1In fact, git references always start out as individual files in various sub-directories, but if a reference doesn't get updated for a long while, it tends to get "packed" (along with all the other mostly-static references) into a single file full of packed references. This is just a time-saving optimization, and the key here is not to depend on the exact implementation, but rather to use git's rev-parse and update-ref commands to extract the current SHA-1 from a reference, or update a reference to contain a new SHA-1.

like image 87
torek Avatar answered Sep 21 '22 19:09

torek


It only performs a copy, no merge.

More specifically it copies the parts of the object store that are in the local repo/branch and are missing from the remote side. This includes, commit objects, refs, trees and blobs.

Tags are a notable exception, they require the --tags flag to be included.

The following blog post, git is simpler than you think has more detail.

like image 34
Chris K Avatar answered Sep 23 '22 19:09

Chris K