Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are "git add file" and "git checkout -- file" symmetric?

I have the following understanding of the git add file and git checkout -- file (but I am not sure if it is correct).

Whenever we edit files with a text editor, we do it in the working directory. Each time we can move file to the so called staging area by executing git add file_name. If we edit the file again (after git add) we change the file in the working directory and in this way, in the working directory we have the file in a "new" state while in the staging area the file is in the "old" state.

When we use git add again, we bring the file in the staging area to the "new" state (the state from the working directory).

If we do git checkout -- file_name, I assume that we take a file from the staging area and use it to overwrite the file in the working directory. In this way we can bring the file in the working directory to the "old" state. Is it correct?

What is also not clear to me, is if we copy or move the file from the staging area. In other words, does git checkout -- file change the state of the file in the staging area. Can we say that after git checkout -- file the file in the staging area change the state of the file to its previous state in the staging area?

like image 639
Roman Avatar asked Dec 05 '13 10:12

Roman


1 Answers

It's almost, but not quite, that symmetric.

It's true that git add file copies the file to the stage (aka "index"). However, the way it does so is a bit weird.

Inside a git repo, everything is stored as a git "object". Each object has a unique name, its SHA-1 (those 40-character strings like 753be37fca1ed9b0f9267273b82881f8765d6b23—that's from an actual .gitignore I have here). The name is constructed by computing the hash on the file's contents (more or less—there's some gimmicking to make sure you don't make a file out of a directory tree or commit, and cause a hash collision, for instance). Git assumes that no matter the contents, the SHA-1 will be unique: no two different files, trees, commits, or annotated-tags will ever hash to the same value.

Files (and symbolic links) are objects of type "blob". So a file that's in the git repo is hashed, and somewhere, git has a mapping: "file named .gitignore" to "hash value 753be37fca1ed9b0f9267273b82881f8765d6b23").

In the repo, directory trees are stored as objects of type "tree". A tree object contains a list of names (like .gitignore), modes, object types (another tree or a blob), and SHA-1s:

$ git cat-file -p HEAD:
100644 blob 753be37fca1ed9b0f9267273b82881f8765d6b23    .gitignore
[snip]

A commit object gets you (or git) a tree object, which eventually gets you the blob IDs.

The staging area ("index"), on the other hand, is simply a file, .git/index. This file contains1 the name (in a funny slightly-compressed form that flattens out directory trees), the "stage number" in the case of merge conflicts, and the SHA-1. The actual file contents are, again, a blob in the git repo. (Git does not store directories in the index: the index only has actual files, using that flattened format.)

So, when you do:

git add file_name

git does this (more or less, and I'm deliberately glossing over filters):

  1. Compute the hash for the contents of file file_name (git hash-object -t blob).
  2. If that object is not already in the repo, write it into the repo (using the -w option to hash-object).
  3. Update .git/index (or $GIT_INDEX_FILE) so that it has the mapping under the name file_name, to the name that came out of git hash-object. This is always a "stage 0" entry (which is the normal, no-merge-conflict version).

Thus, the file isn't really "in" the staging area, it's really "in" the repo itself! What's in the staging area is the name to SHA-1 mapping.

By contrast, git checkout [<tree-ish>] -- file_name does this:

  1. If given a <tree-ish> (commit name, tree-object ID, etc—basically anything git can resolve to a tree), do the name lookup from the tree found by converting the argument to a tree object. Using the object ID thus located, update the hash in the index, as stage 0. (If file_name names a tree object, git recursively handles all the files in the directory the tree represents.) By creating stage 0 entries, any merge conflicts on file_name are now resolved.

    Otherwise, do the name lookup in the index (not sure what happens if file_name is a directory, probably git reads the working directory). Convert the file_name to an object ID (which will be a blob by this point). If there is no stage-0 entry, error out with the "unmerged" message, unless given -m, --ours, --theirs options. Using -m will "un-merge" the file (remove the stage 0 entry and re-create the conflicted merge2), while --ours and --theirs leave any stage 0 entry in place (a resolved conflict stays resolved).

  2. In any case, if this has not yet errored-out, use the blob SHA-1(s) thus located to extract the repo copy (or copies, if file_name names a directory) into the working directory.

So, the short version is "yes and no": git checkout sometimes modifies the index, and sometimes only uses it. However, the file itself is never stored in the index, only in the repo. If you git add a file, change it some more, and git add it again, this leaves behind what git fsck will find as a "dangling blob": an object with no reference.


1I'm deliberately omitting a lot of other stuff in the index that is there to make git perform well, and allow --assume-unchanged etc. (These are not relevant to the add/checkout action here.)

2This re-creation respects any change to merge.conflictstyle, so if you decide you like diff3 output and already have a conflicted merge without the diff3 style, you can change the git config and use git checkout -m to get a new working-directory merge with the new style.

like image 56
torek Avatar answered Sep 24 '22 02:09

torek