I accidentally add
ed a bunch of text files to my git repo and attempted to unstage them (prior to commit):
git reset dir/*.txt
When the command was run it said:
unstaged changes after reset:
dir2/file.h
dir4/file2.cc
...
The files had no relation to the reset wildcard. As far as I can tell, the files are still staged for commit as modified and appear to be intact. What is git trying to tell me?
Git is trying to be helpful here—perhaps overly helpful, in this case. The phrase unstaged changes is a way of thinking about things that is meant to make using Git simpler. This doesn't always work, because Git is complicated.
Here's the underlying reality: Git has, at all times,1 three copies of each file. Two of these copies literally can't be seen, at least not with the file navigation tools that you normally use.
Two of these three copies make sense, when you think about what a commit is and does:
Each commit holds a full snapshot of every file, saved forever. The snapshot version of the file is from how that file looked at the time you (or whoever) made the commit. (Each commit also holds some other stuff—some metadata, or information about the commit itself—but we'll ignore that here.)
Because each commit has a full snapshot of every file, the files stored inside commits aren't stored as ordinary everyday files. If they were, your repository would grow enormously fat ridiculously quickly.
So, the files inside Git commits are stored in a special, read-only, Git-only, compressed and de-duplicated format. Because they're read-only, it's entirely safe for commits to share these copies of files. Making a new commit with ten thousand files, but 9999 of them the same as the last commit, really just re-uses the 9999 files and snapshots only the one changed file. And, if the file has been changed back to the way it was in some earlier commit, that last file is shared with the earlier commit, so that the new snapshot takes no space at all.2
The problem with all of the above is that the files inside commits are completely unusable for getting any actual work done: they can only be read by Git, and nothing—not even Git itself—can write to them. So, to use a commit, Git has to copy it out, expanding the files from their special Git-only format into ordinary everyday files. These everyday-form files go into what Git calls your working tree or work-tree.
So this all makes sense: the two "active" copies of some file, such as README.md
or whatever, are the current commit version—this one is read-only and is in whichever commit you've selected to be the current commit, and only Git can see and read it—and your working tree version, which isn't actually in Git. Git has extract it into a work area, and you are now working with it, but it's not in the repository. It got copied out of the repository; it may or may not have been changed since then.
Two copies are all we would really need, and other version control systems—that are not Git—stop here, with just the two "active" copies. But for whatever reason, good or ill, Git doesn't stop here. Git inserts, sort of between the frozen README.md
and the useful one, a third copy. This third copy is in something that Git calls, variously, the index, or the staging area, or—rarely these days—the cache. These three names are all names for the same thing.3
1Well, most times. If you break things down fine enough—especially, if you use some of Git's non-user-facing tools—you can do fun tricks or whatever. There are also so-called bare repositories, which don't have a work-tree at all, except if you assign one temporarily; that's another complication we'll just ignore here.
2Except, that is, for the space required to hold the metadata. The details here get pretty sticky too. The point of all of this is that by re-using old files, Git can keep its repository small. Given the way the files get compressed, in some cases a Git repository with many commits is sometimes smaller than any checked-out version!
3Some internal parts of Git make a distinction between the index, which is usually the file .git/index
, and the cache, which at that point is an in-memory data structure. The rather ancient git apply
command has two separate flags, --index
and --cached
, that do different things. But git rm --cached
really means remove from the index; here the words index and cache are truly synonymous, for instance.
Technically, what's in the index isn't the file itself: it's the file's name—the full name of the file as Git sees it, complete with forward slashes, such as path/to/file.ext
—and a whole bunch of internal stuff, some of which you can see with git ls-files --stage
. (Try it, but be aware that it spills out lots of output without pausing.)
Putting the technical details aside, though, what the index achieves is that it holds your proposed next commit. The files in the index are in the same form as those in commits—they're pre-compressed and pre-de-duplicated—but unlike the committed copy, Git can overwrite them, by removing that de-duplicated copy and making a new de-duplicated copy.
Initially, when you git checkout
some particular commit—the latest or tip commit on branch feature
, for instance—Git fills its index with the files from that commit, and also fills in your work-tree with those files. The result is that all three active copies match. The committed copy, which is read-only, matches the index copy. The index copy, which can be replaced, matches both the committed copy and your work-tree copy.
As you do your work, you will modify some file(s). These will naturally be your copies of the files, which are in a usable format. Git doesn't use these files! Git created them earlier, by extracting them from a commit, but other than that, Git just leaves these files for you. If you've changed one, you need to tell Git to do something about its index / staging-area copy.
What you do is run git add
. This has Git read your work-tree copy, compress it, de-duplicate it against all the stored files, and update its index copy. Now Git's index copy matches your work-tree copy.
Note that because there are three copies, you can get all three of them out of sync: just check out some commit, modify some file, run git add
on that file, and then modify the file again. Now the frozen-for-all-time committed copy is different from the index copy, which is the one you add
-ed earlier, and your working copy is still different from that because you've changed it again without git add
-ing.
git status
works by doing two diffsWhen you run git status
, it first prints out some overall information that people find helpful, such as the current branch name,4 how far this branch is "ahead" or "behind" some other branch or remote-tracking name, and so on. Then it gets to the files.
The first set of files it lists—if it lists any here—are the ones it calls staged for commit. What it's really doing, though, is comparing the current commit to the index. For each file that is the same, it says nothing at all. For each file that is different, it says staged for commit.
The second set of files it lists, if any again, are the ones it calls not staged for commit. Again, though, what it's really doing is comparing its index to your work-tree. For each file that is the same, it says nothing at all. For each file that is different, it says not staged for commit.
4Git stores the current branch name in something it calls HEAD
. There's one HEAD
for each work-tree, and one index for each work-tree; the main HEAD
and index are .git/HEAD
and .git/index
, normally, and any added work-trees get a new pair. You're not supposed to need to know this, but it's handy sometimes to just look at .git/HEAD
—it's currently just a plain-text file—to get a good feel for this. This might all change in the future, though: HEAD
used to be a symbolic link, for instance.
git reset
The git reset
command is complicated.5 We'll ignore most of the complications, and concentrate just on the kind of git reset
you got when you ran:
git reset dir/*.txt
This particular kind of git reset
is now something you can achieve with the new (since Git 2.23) git restore
. It copies files from the current commit, to Git's index, without touching your work-tree.6
When you did this, you had something find a list of file names. This gets a bit complicated because the something might be your shell, or it might be Git, and if it's Git, the set of file names Git will find could be different from the set your shell will find. For simplicity, let's just assume that the set of files found either way was the same: all the files that the shell would find with dir/*.txt
are the same set of files that Git would find matching against dir/*.txt
in the current commit. So Git copied all of these files from the current commit, into Git's index.
If they were already in Git's index—as that version of that file—this has no effect. But wherever any file in Git's index / staging-area was different—presumably because you'd used git add
on it, after changing your copy—this overwrote the updated index copy, setting it back to match the committed copy instead. So this has the effect of undoing any git add
s you ran earlier, for files matching dir/*.txt
.
Having done that, git reset
now does a partial git status
. That is, it compares every file in Git's index vs the copy of that same file in your work-tree. For those that are different, Git lists them out as "unstaged". Git didn't touch dir2/file.h
in Git's index, but it was already different in your work-tree than it was in Git's index. So git reset
's output here includes it. The same goes for the other listed files.
5I'm of the opinion that it's too complicated, and deserves to be split up the way git checkout
got split into git switch
and git restore
. Of course for compatibility purposes, there will still be a git reset
, just as Git 2.23 and later still have git checkout
, even after the split-up.
6git restore
is actually more powerful because you can pick any commit, not just the current one, and you can choose whether to copy it to Git's index, your work-tree, or both. So if git reset
were split up as I am thinking of in footnote 5, one of the two commands would be git restore
.
What git reset
does, after successfully modifying some of the index copies of files, is run a partial git status
. This compares the index copies of files to your work-tree copies. It doesn't just compare files that were individually re-set, but rather all index entries. Since the index lists every file that will be in the next commit, this can be a lot of files.
Note that what you are doing, as you modify files and then run git add
, is arranging the intermediate copies of each file, in Git's "staging area" (index), so as to arrange everything for the next commit. This is why we call it the staging area: we put particular copies of particular files "up on stage", and then take a photographic snapshot that we call a commit. That commit is built from what's on the stage, which isn't necessarily the same as what you are working with.
Other version control systems don't do this: they don't have a separate staging area, and when you make a new commit, they snapshot your working tree. This has its own pluses and minuses, and winds up needing a list of files (often called a manifest) since working trees tend to have a lot of files in them that shouldn't be committed. Git uses its index for this purpose: if you don't copy a file into the index—i.e., don't put it up "on stage" for the later snapshot—it won't be in the snapshot.
But, because the index has a full copy of every file, that makes three copies, which makes for these odd situations. Since you can't see the index copy, you need something—git status
, usually—that will compare the index copies against the working tree copies, and let you know if you want to update your proposed next commit. We "see" the index by its shadows: when it matches the current commit and/or work-tree, we don't see anything. Having fewer shadows makes the ones that are there stand out, so this works fairly well. But it's tricky!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With