Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are different branches stored locally from git on my disk?

I have only one version repository sitting on my local HDD but multiple branches on Github. Shouldn't there be copies of code per branch? what version of code do I have sitting on my local disk?

like image 854
Sarun Luitel Avatar asked Dec 03 '18 03:12

Sarun Luitel


1 Answers

Shouldn't there be copies of code per branch?

No. This is not how Git's branch names work.

What Git stores is not, in a sense, branches at all. Git stores commits. Commits are nearly everything in Git. Where branches come in is much later, where they have a rather minor role. They're important to humans, of course, because the actual name of any commit is a big ugly hash ID that is useless to humans. Branch names let us use names instead of hash IDs. But Git mostly really cares about the commits, with their hash IDs.

I have only one version repository sitting on my local HDD but multiple branches on Github.

Generally, the repository on your local drive is a clone of the repository on GitHub. Or, we could equally say that the repository on GitHub is a clone of the repository on your local drive. Neither repository is "better" than the other, they just have different computers to hold them, and possibly slightly different sets of commits, and more-likely different branch names in them (but since Git cares more about the commits, that doesn't matter so much to Git).

To understand how this works, start with the commit. Each commit—named by some big ugly hash ID, such as 8a0ba68f6dab2c8b1f297a0d46b710bb9af3237a—stores a complete snapshot of some set of files. Along with that snapshot, each commit has some metadata, some data about the commit. If you made the commit yourself, for instance, the commit stores your name and email address, and the time-stamp for when you made the commit. And, each commit generally stores the hash ID of its previous or parent commit. (The parent of 8a0ba68f6dab2c8b1f297a0d46b710bb9af3237a above is 15cc2da0b5aaf5350f180951450e0a5318f7d34d, for instance.)

Git uses these parent links to find commits. Let's consider a tiny repository that you created recently, in which you have made just three commits. Each commit has some big ugly hash ID, but let's use single uppercase letters instead. The first commit you ever made is therefore commit A. Because it is the first commit, it has no parent.

Then, with commit A made, you made commit B. It has A as its parent. Likewise, you used C to make B, so C stores B's hash ID.

Whenever we have the hash ID of some commit, we say that we can point to that commit. So C points to B, and B points to A. If we draw this, we get:

A <-B <-C

Now, with simple single uppercase letters and this drawing, it's obvious that we need to start at C and work backwards. This is what Git does, but the way that Git finds commit C is to use a branch name. Since we just started this repository we probably are still using master, so let's draw that in:

A <-B <-C   <--master

The name master holds the hash ID of commit C, so that Git can find C. From there, Git can work backwards, to B and then A.

If we want to add a new commit, we run git checkout master, which gets us a copy of C that we can work on, and remembers that we're now on master which is commit C. We do our work, git add files as needed, and run git commit. Git makes a new commit D (from whatever is in the index, which I'm not going to define here), and sets D to point back to C:

A--B--C   <--master
       \
        D

Now the tricky part happens: since we just created D, and we're on master, Git now updates master with whatever the real hash ID is for new commit D. So now master points to D, not C:

A--B--C
       \
        D   <--master

and we can straighten out the kink in the drawing (and I like to loosen up the arrow a bit too):

A--B--C--D   <-- master

Now suppose we decide to add a new branch. We run, e.g., git checkout -b develop. What this does is just add a new name, but keep all the same commits. The new name, develop, points to the commit we choose, which defaults to the one we're using right now, i.e., commit D:

A--B--C--D   <-- master, develop

Now we need a way to draw which branch name we're using. To do this we'll attach the word HEAD (in all uppercase) to one of the two branches. Since we did git checkout to attach HEAD to the new develop, let's draw that:

A--B--C--D   <-- master, develop (HEAD)

Now it's time to make another new commit. Without doing anything else, we modify some files, use git add to copy the updated files into the index, and run git commit to make a new commit, which we'll call E (but which gets, as usual, some incomprehensible hash ID). When Git updates a branch name, the name it updates is the one that has HEAD attached to it, so the graph now looks like this:

A--B--C--D   <-- master
          \
           E   <-- develop (HEAD)

At this point, suppose I clone your repository (directly from your machine, or from an exact copy you send to GitHub, with the same commits and same branch names). My Git will at first do this:

A--B--C--D   <-- origin/master
          \
           E   <-- origin/develop

Here, I don't have any branches. I have all the same commits, but what is remembering their ends is not branch names, but rather remote-tracking names. Instead of master, I have origin/master to remember commit D's hash ID. Instead of develop, I have origin/develop to remember E's hash ID.

As the last step of my clone, my own Git tries to git checkout master. Instead of failing because I don't have a master, this actually creates my master, using my origin/master that my Git copied from your Git's master. So now I have:

A--B--C--D   <-- origin/master, master (HEAD)
          \
           E   <-- origin/develop

If I now run git checkout develop, my Git will look through my repository and will not find a develop, but it will find an origin/develop. My Git will then create a new name, develop, pointing to commit E, and attach HEAD to that name instead of to master:

A--B--C--D   <-- origin/master, master
          \
           E   <-- origin/develop, develop (HEAD)

Note that no commits have been copied. Git merely added a new name, pointing to some existing commit. The existing commit was already there, in my local repository.

You connect two Git repositories with fetch and push

This is how you get updates from the GitHub repository. If they have some new commit(s) that you don't have, you can run git fetch in your Git repository. Your Git uses the origin name, which your Git created during your original git clone operation, to find the URL for GitHub. Your Git then calls up GitHub and your Git and their Git have a little conversation.

With git fetch, your Git asks their Git about their branch names and the final commits on each of those names. Their Git might say: my master points to some commit with big ugly hash—let's just call this H, rather than trying to guess the actual hash ID. Your Git looks at your repository and says to itself: I don't have H, I'd better ask for it. Your Git asks their Git about H; they say H's parent is G, whose parent is F, whose parent is D. You both have D, so this part of the conversation is done. Their Git might also say: my develop points to commit E. Your Git already has E, so this part of the conversation is done too. There are no other names left to worry about, so now your Git has their Git send over commits F, G, and H, which your Git saves away in your repository:

           F--G--H   <-- origin/master
          /
A--B--C--D   <-- master
          \
           E   <-- origin/develop, develop (HEAD)

Note that aside from adding new commits to your own repository, the only other thing your Git has done is update all your origin/* names so that they match the other Git's names. That is, your origin/master has moved from the shared commit D to the now-shared commit H.

Fetch is always safe, but git push is different

It's always safe to run git fetch: this connects your Git to some other Git, gets any new commits from them, and updates your remote-tracking names. Since those names are just remembering their work, not your work, that's safe. If you have made new commits, your new commits are still on your branches, which are not their branches, other than whatever commits you both have in common.

When you use git push, your Git calls up their Git and the two Gits have a very similar conversation. Your Git offers their Git some new commits, if you have new ones. But then, instead of your Git updating your remote-tracking names, your Git offers their Git a polite request: Please, if it's OK, update your master to match my master, or please, if it's OK, update your develop to match my develop—or perhaps even both. (You can push as many branch names as you like in one go.)

Suppose that after the earlier git fetch, you had this:

           F--G--H   <-- origin/master
          /
A--B--C--D   <-- master
          \
           E   <-- origin/develop, develop (HEAD)

You stayed on your develop and made some new commits, which we'll call I and J, so now you have this:

           F--G--H   <-- origin/master
          /
A--B--C--D   <-- master
          \
           E   <-- origin/develop
            \
             I--J   <-- develop (HEAD)

But, unbeknownst to you, someone else added a new commit to their develop. That commit has a hash ID that you don't have anywhere. Let's call it K in their Git, so that they have this:

           F--G--H   <-- master
          /
A--B--C--D
          \
           E--K   <-- develop

You send them I and J, so that now they have this:

           F--G--H   <-- master
          /
A--B--C--D
          \
           E--K   <-- develop
            \
             I--J   <-- (polite request to make develop go here)

If they were to accept your request, and move their develop, what happens to their commit K? They end up with this:

           F--G--H   <-- master
          /
A--B--C--D
          \
           E--K   ???
            \
             I--J   <-- develop

As we described above, the way Git works with commits and branch names is that it uses the branch name to find the last commit, and then works backwards. Working backwards from J, their Git would go to J. From J they would go back to I, then E, then D, C, B, and A. What this means is that commit K is lost!

As a result, their Git will say no to your polite request: No, if I moved my develop, I would lose some commit(s). (Your Git reports this as "rejected" and "not a fast-forward".)

In this case, what you need to do is to git fetch to pick up their new commit K, then do something in your own repository to accommodate it. That's for another set of questions, though (all of which are already answered on StackOverflow).

like image 89
torek Avatar answered Sep 18 '22 05:09

torek