Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between 'git rm --cached', 'git restore --staged', and 'git reset'

I have come across the following three ways in order to unstage the files that were staged by the command 'git add'

git rm --cached <file>
git restore --staged <file>
git reset <file>

Their behaviors looked completely same when I ran those commands one by one. What exactly are the differences between them?

like image 896
sekai_no_suda Avatar asked Dec 24 '20 05:12

sekai_no_suda


People also ask

What is the difference between git rm -- cached and git reset?

git rm —cached file will remove the file from the stage. That is, when you commit the file will be removed. git reset HEAD — file will simply reset file in the staging area to the state where it was on the HEAD commit, i.e. will undo any changes you did to it since last commiting.

What is the difference between git restore and git reset?

There are overlaps between these two commands, and differences. Both can be used to modify your working copy and/or the staging area. However, only git-reset can modify your repository. In this sense, git-restore seems the safer option if you only want to revert local work.

What does git restore -- staged do?

By default, the git restore command will discard any local, uncommitted changes in the corresponding files and thereby restore their last committed state. With the --staged option, however, the file will only be removed from the Staging Area - but its actual modifications will remain untouched.

Does git rm -- cached delete the file?

By default, the git rm command deletes files both from the Git repository as well as the filesystem. Using the --cached flag, the actual file on disk will not be deleted.


1 Answers

Two are the same; one is not, except under particular circumstances.

To understand this, remember that:

  • a commit holds a snapshot of all files that Git knew about, as of the form they had when you said to commit them;
  • the snapshot is made from the files that are in Git's index, aka staging-area, aka cache (three terms for the same thing); and
  • git add means make the copy in the index/staging-area/cache match the copy in my working tree (by copying from the working tree if the working tree copy is updated, or by removing from the index if the working tree copy is removed).

So the index / staging-area contains, at all times, your proposed next commit, and was initially seeded from your current commit when you did a git checkout or git switch to obtain that commit.1 Your working tree thus contains a third copy2 of each file, with the first two copies being the one in the current commit aka HEAD, and the one in the index.

With that in mind, here's what each of your commands does:

  • git rm --cached file: removes the copy of the file from the index / staging-area, without touching the working tree copy. The proposed next commit now lacks the file. If the current commit has the file, and you do in fact make a next commit at this point, the difference between the previous commit and the new commit is that the file is gone.

  • git restore --staged file: Git copies the file from the HEAD commit into the index, without touching the working tree copy. The index copy and the HEAD copy now match, whether or not they matched before. A new commit made now will have the same copy of the file as the current commit.

    If the current commit lacks the file, this has the effect of removing the file from the index. So in this case it does the same thing as git rm --cached.

  • git reset file: this copies the HEAD version of the file to the index, just like git restore --staged file.

(Note that git restore, unlike this particular form of git reset, can overwrite the working tree copy of some file, if you ask it to do so. The --staged option, without the --worktree option, directs it to write only to the index.)

Side note: many people initially think that the index / staging-area contains only changes, or only changed files. This is not the case, but if you were thinking of it this way, git rm --cached would appear to be the same as the other two. Since that's not how the index works, it's not.


1There are some quirky edge cases when you stage something, then do a new git checkout. Essentially, if it's possible to keep a different staged copy in place, Git will do so. For the gory details see Checkout another branch when there are uncommitted changes on the current branch.

2The committed copy, and any staged copy, are actually kept in the form of an internal Git blob object, which de-duplicates contents. So if these two match, they literally just share one underlying copy. If the staged copy differs from the HEAD copy, but matches any—perhaps even many—other existing committed copy or copies, the staged copy shares the underlying storage with all those other commits. So calling each one a "copy" is overkill. But as a mental model, it works well enough: none can ever be overwritten; a new git add will make a new blob object if needed, and if nobody uses some blob object in the end, Git eventually discards it.


A specific example

In a comment, pavel_orekhov says:

It is still not clear to me where "git rm --cached" and "git restore --staged" differ. Could you please show a series of commands with these 2 that exhibit different behavior?

Let's check out a specific commit in the Git repository for Git itself (clone it first if needed, e.g., from https://github.com/git/git.git):

$ git switch --detach v2.35.1
HEAD is now at 4c53a8c20f Git 2.35.1

Your working tree will contain files named Makefile, README.md, git.c, and so on.

Let's now modify some existing file in the working tree:

$ ed Makefile << end
> 1a
> foo
> .
> w
> q
> end
107604
107608
$ git status --short
 M Makefile

The > signs are from the shell asking for input; the two numbers are the byte counts of the file Makefile. Note the output from git status is SPACEMSPACEMakefile, indicating that the index or staging area copy of Makefile matches the HEAD copy of Makefile, while the working tree copy of Makefile differs from the index copy of Makefile.

(Aside: I accidentally added two foo lines while preparing the cut and paste text. I'm not going to go back and fix it, but if you do this experiment yourself, expect slightly different outputs.)

Let's now git add this updated file, then replace foo in the first line with bar:

$ git add Makefile
$ git status --short
M  Makefile

Note that the M has moved left one column, M-space-space-Makefile, indicating that the index copy of Makefile differs from the HEAD copy, but now the index and working tree copies match. Now we do the foo-to-bar replacement:

$ ed Makefile << end
> 1s/foo/bar/
> w
> q
> end
107608
107608
$ git status --short
MM Makefile

We now have two Ms: the HEAD copy of Makefile differs from the index copy of Makefile, which differs from the working tree copy of Makefile. Running git diff --cached and git diff will show you exactly how each pairing compares.

$ git diff --cached
diff --git a/Makefile b/Makefile
index 5580859afd..8b8fc5a6d6 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,5 @@
-# The default target of this Makefile is...
+foo
+foo
 all::
 
 # Define V=1 to have a more verbose compile.
$ git diff
diff --git a/Makefile b/Makefile
index 8b8fc5a6d6..96a787d50d 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,4 @@
-foo
+bar
 foo
 all::
 

Now, if we run git rm --cached Makefile, this will remove the index copy of the file Makefile entirely, and git status will change accordingly. Because we have all these modifications going around Git demands the "force" flag as well:

$ git rm --cached Makefile
error: the following file has staged content different from both the
file and the HEAD:
    Makefile
(use -f to force removal)
$ git rm --cached -f Makefile
rm 'Makefile'
$ git status --short
D  Makefile
?? Makefile

We now have no file named Makefile in our proposed next commit in the index / staging-area. However, the file Makefile still appears (with the first line reading bar) in the working tree (inspect the file yourself to see). This Makefile is an untracked file so we get two output lines from git status --short, one to announce the impending demise of file Makefile in the next commit, and the other to announce the existence of the untracked file Makefile.

Without making any commit, we now use git restore --staged Makefile:

$ git restore --staged Makefile
$ git status --short
 M Makefile

The status is now space-M again, indicating that Makefile exists in the index (and therefore will be in the next commit), and furthermore, matches the HEAD copy of Makefile, so git diff --staged—which is another way to spell git diff --cached—will not show it (and indeed will show nothing). The working tree copy remains undisturbed, and still contains the extra line bar, as git diff shows:

$ git diff --staged
$ git diff
diff --git a/Makefile b/Makefile
index 5580859afd..96a787d50d 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,5 @@
-# The default target of this Makefile is...
+bar
+foo
 all::
 
 # Define V=1 to have a more verbose compile.

Again, the key to understanding all of this is:

  • Every commit holds a full snapshot of every file that Git knows about.

  • This snapshot exists, at all times, in Git's index, which Git also calls the staging area, or occasionally—now mostly in the --cached flag—the cache. The --staged or --cached flag3 generally means do something with this index / staging-area. Commands like git reset, git rm, and git add implicitly work with the index / staging-area, although flags may modify this behavior somewhat; the git restore command has the explicit --staged and --worktree flags.

  • Meanwhile, your working tree contains ordinary everyday files. These are the only files you can see and work with directly (with your editor for instance); only Git commands can see and work with the committed and index copies of files.

  • Committed copies of files can never be changed. They are in those commits forever (or as long as those commits continue to exist): they are read-only. However, the index copy of a file can be replaced wholesale, with git add, or patched, with git add -p, or removed entirely, with git rm or git rm --cached.

  • Ordinary files are, well, ordinary files: all your ordinary commands work ordinarily on the ordinary files. (And isn't it extraordinary how the ordinary word "ordinary" is now amusing?)

  • Running git commit takes all the index copies and freezes them into a new snapshot. So what you do, as you work in Git, is:

    • manipulate ordinary files, in ordinary ways;
    • git add them to update Git's index copy, to prepare the freeze; and
    • git commit the result, to freeze them for all time.

    This is the process for making a new commit, and if you change your mind and decide not to make a new commit, git restore --staged or git reset can be used to re-extract a committed copy into the index copy. But git rm removes an index copy entirely.

So if and only if removing the index copy entirely puts things back the way they were (which can happen when some file is new), then "make the index copy match the nonexistent HEAD copy, by removing it" is a correct way to do what you want. But if the HEAD commit contains a copy of the file in question, git rm --cached the-file is wrong.


3Note that --cached and --staged have the same meaning for git diff. For git rm, however, there's simply no --staged option at all. Why? That's a question for the Git developers, but we can note that historically, in the distant past, git diff did not have --staged either. My best guess is therefore that it was an oversight: when whoever added --staged to git diff did it, they forgot to add --staged to git rm too.

like image 63
torek Avatar answered Sep 28 '22 01:09

torek