Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a conflicted file appear both staged and unstaged in Git?

enter image description here

I've been experimenting with Git for the purposes of writing a Git tutorial. I created a branch, modified the file on both branches, then merged the branch back to master to generate a conflict. What I'm curious about is why the conflicted file appears to be both "staged" and "unstaged". If I click the file in either place the diff window shows the exact same information.

like image 623
Chev Avatar asked Jul 23 '19 23:07

Chev


2 Answers

A file being both staged and unstaged is common in Git. Git recognizes different parts of a changed file as 'hunks', hence the buttons 'Stage hunk' and 'Discard hunk' in the screenshot. This UI situation means that some of the file is staged, and some if it is not. You could commit here and only the changes in the top menu would be committed.

I'm not sure why each version of the file shows the same information; that's surprising. Sourcetree is probably having trouble showing the conflict in a way that makes sense.

To proceed, you'll want to unstage everything, resolve your merge conflict, and then commit the corrected file. This means removing these lines:

<<<<<<< HEAD
=======
>>>>>>> new-feature

And keeping the code you want from either above or below the centerline (or both).

like image 113
Jake Worth Avatar answered Nov 11 '22 23:11

Jake Worth


Some of this depends on your GUI, but the command line git status command does this too—in a slightly different way—so not all of it is GUI-specific. The real answer to why a file can appear to be both staged and unstaged is this: Calling a file "staged" or "unstaged" is sort of a lie. It's not a mean and vicious lie. It's more the kind of nice lie people use to soften a hard truth. It's mostly harmless, and mostly helps people get through the day.

Unfortunately, in the case of a merge conflict, the lie stops being harmless. The details really matter here. We have to look at how Git really works, and discover the truth behind this "staged" and "unstaged" lie.

The index / staging-area / cache

At the heart of all of this confusion lies—er, stands? sits?—the index. Git's index is a terribly central and important data structure (typically contained in a single file named .git/index, although there are a lot of semi-experimental tricky augmented variants for speed these days). What the index contains is a series of slots, one group-of-slots per file name, for every file that is tracked. In fact, the definition of a tracked file is simply any file that is in the index. An untracked file is a file that is in the work-tree, but not in the index.

To make full sense of this concept you also need to know that Git stores each file's data in a special, frozen, compressed, read-only, Git-only format called a blob object. Each unique blob object has a unique hash ID, which means that a non-unique blob object—file data that is used more than once—can really just re-use the same hash ID over and over again. So when you make a commit and it holds a full snapshot of all of your files, what Git is really doing is using blob objects to hold the files. If the files in this commit are mostly the same as the ones in some earlier commit, Git just re-uses the existing blob objects.

What the index really holds can be seen much more directly—though still in a prettied-up form—using git ls-files --stage. In a big repository, this produces a whole lot of output. Here's a snippet from a Git repository for Git:

$ git ls-files --stage
[snip]
100644 82cd0569d51d0a2d69b013a3322b6d5985a1927c 0       .mailmap
100644 ffb1bc46f2d9605f7c3fba478f918fcc288bbdd6 0       .travis.yml
100644 8c85014a0a936892f6832c68e3db646b6f9d2ea2 0       .tsan-suppressions
100644 536e55524db72bd2acf175208aef4f3dfc148d42 0       COPYING
100644 ddb030137d54ef3fb0ee01d973ec5cee4bb2b2b3 0       Documentation/.gitattributes
100644 9022d4835545cbf40c9537efa8ca9a7678e42673 0       Documentation/.gitignore
[snip]
100755 122f6479ef9f772f575ecb673e0f960900526fc1 0       GIT-VERSION-GEN
[snip]

The first number is a mode: always 100644 or 100755 for a regular file, 120000 for a symbolic link, or 160000 for a gitlink (submodule stuff). The second number (well, hexadecimal number) is the hash ID: for a file, that's the hash ID of the blob object that contains the file's data. The third number—always zero above, but not for merge conflicts—is the staging slot number. The last field is the file's name: a file's contents get stored as a blob object, but the name of that blob object is just a hash ID. The names are stored elsewhere (technically, in tree objects, but most people don't need to care about that).

The effect of all of this is that, except during merge conflicts, what the index holds is a proposed new commit. It has a copy—or really, a reference via the blob hash ID—of the already-compressed, frozen, Git-ified, read-only file data that would go with the new commit.

We can also look at any existing commit. For instance, here is a snippet from the same repository's master (slightly out of date with public Git right now):

$ git ls-tree HEAD
[snip]
100644 blob 82cd0569d51d0a2d69b013a3322b6d5985a1927c    .mailmap
100644 blob ffb1bc46f2d9605f7c3fba478f918fcc288bbdd6    .travis.yml
100644 blob 8c85014a0a936892f6832c68e3db646b6f9d2ea2    .tsan-suppressions
100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    COPYING
040000 tree 0785e26289f9af7de3894161a78d00b2e1d720ef    Documentation
100755 blob 122f6479ef9f772f575ecb673e0f960900526fc1    GIT-VERSION-GEN
[snip]

Note that this time we have a new mode 040000 tree object, that's not present in the index. That's because once committed, Git commits refer to tree objects that work like directories (though they aren't quite the same as the OS's directories). The index omits them because the index holds only files (well, for submodules, gitlinks too). This is most of what keeps Git from storing an empty directory.

The upshot of all of this—the fact that the current (frozen for all time) commit holds a tree-ized variant of the flattened version in the index, which holds the proposed new commit, is that Git can easily compare the current commit to the index's proposed new commit. Whatever is different here, Git calls staged.

The index, the staging area, and (rarely these days) the cache are all terms for this same single thing. This one thing with three names is at the heart of making new commits. Your work-tree, where your files have their normal everyday form and where you can see and work with them, is to Git largely a sort of side-shadow. What you do with your work-tree is up to you. Every once in a while, you tell Git: Copy a file from my work-tree, compressing and Git-izing it and turning it into the frozen Git-only format, and put that object into the index. You do this using git add. The newly Git-ized data is not yet in a commit—it's not frozen for all time; you can change it by replacing it in the index—but it's now ready to be committed. Running git commit creates the commit, which freezes this for all time, making the blob objects permanent.1

Note that git status doesn't just compare the HEAD commit to the index. It also, separately, compares the index to the work-tree. Any files that are different here are printed out as unstaged. If the three active copies of some file—HEAD:file, :file, and file—are all different, then that one file will be both staged and unstaged.


1Well, the blobs are permanent as long as the commit itself exists. If you get rid of a commit, and some blobs are specific to that commit, those blobs will also eventually go away. The git gc command takes care of figuring out which commits are still wanted, and which trees and blobs are used by which commits. Any unused object—commit, tree, blob, or the last kind, annotated tag, can be removed at this point.


During a conflicted merge, the index takes on an expanded role

When you perform a true merge, which has three inputs—a merge base and two tip commits—Git temporarily has to shove three copies of each file into the index. This is what the nonzero staging slot numbers are for.

Suppose that the merge base commit has a version of file that reads:

I am a file.

Suppose further that the left-side (current branch) version of file reads:

I am a file
with two lines.

Meanwhile the right-side version reads:

I am the ghost of a file, killed by Macbeth's two hired assassins.

Since these two changes to this file cannot be combined automatically, Git will:

  • leave the merge base version of the file in :1:file
  • leave the left/local/HEAD/--ours version of the file in :2:file
  • leave the right/remote/--theirs version of the file in :3:file

In addition to these three versions, there is of course still a HEAD:file—identical at this point to the slot-2 version—and a work-tree version of the file file. The work-tree version contains Git's conflict markers. So now, instead of three active copies of the file, there are five of them!

Your job at this point is to come up with the correct combined file, and then put that into slot zero of the index, removing the three other copies. You can do this by editing the work-tree copy and running git add file. The git add command knows that if there are three nonzero stage copies, and you're adding one, it should go into staging slot zero for that file, removing the other three. Now you're back down to just three copies and git status can tell you a nice useful tale—a useful lie, talking about staged-ness—as to whether the index copy matches the HEAD and/or work-tree copies.

You can also use a merge tool or some GUI thing to produce the correct merged file. As always, the end goal is to stuff the correct copy of file into staging slot zero, emptying out staging slots 1, 2, and 3. That resolves the merge conflict and leaves you with something you can commit.

While there are five copies of the file, though, just saying staged or unstaged or both doesn't cover the real situation. If you want to write a merge tool, you need to know how to extract the three versions of each conflicted file—or, in the case of modify/delete or rename/delete or rename/rename conflicts, what else to do about the mess. (There's a bit of a problem with this, as what's left behind in the index is not sufficient to untangle some of the rename cases.)

like image 31
torek Avatar answered Nov 11 '22 21:11

torek