I'm new to git and just started learning from this site: https://dev.to/unseenwizzard/learn-git-concepts-not-commands-4gjc
My doubt is in this part of the document:
Have a look at Alice.txt.
It actually contains some text, but Bob.txt doesn't, so lets change that and put Hi!! I'm Bob. I'm new here. in there.
To summarize, we make a change to Bob.txt with this sentence:- Hi!! I'm Bob. I'm new here.
The we staged this change but did not commit it.
Later on, we went ahead and made another change to the same sentence by removing the extra ! after Hi.
So now the sentence looks like this:- Hi! I'm Bob. I'm new here.
Now we staged and committed this new change.
My doubt is: Is the previous modification to Bob.txt (with the two !!) lost from the staging area?
I ran git status
but it doesn't mention the previous change. Can I go back and commit the change with the two "!!" ?
VonC's answer and Romain Valeri's addition to that answer are both correct, but might be hard to visualize or understand in various ways. Here's a way to understand it that might be at least somewhat intuitive.
When you're working with a commit in Git, there are up to three copies of each file. Once you know that a commit contains a full snapshot of every file, in a read-only (and Git-only, compressed, and de-duplicated) format, you will see why it's necessary to have two copies:
Git's read-only copy in HEAD your working tree copy
---------------------------- ----------------------
file1.ext file1.ext
file2.ext file2.ext
readme.md readme.md
Even if the contents of file1.ext
in the commit match the content of file1.ext
in your working treeâyour working tree being where you get to see and work on/with the files that Git extracted from the commitâthe Git copy is in some special, weird, compressed, and read-only format. Only Git itself can even read this file, and nothingânot even Git itselfâcan overwrite it.1 Your working tree contains ordinary everyday files, that everyone can read and write in the ordinary way, so Git really does have to copy it out, every time you check out the commit.
This same principle holds for other version control systems too: in Mercurial, or SVN, or CVS or ClearCase or whatever, you'll often find multiple copies of files (the precise details depend greatly on the VCS). What's particularly weird about Git, though, is that instead of just two copies of each file, Git provides three:
HEAD (r/o) staging area working tree
---------- ------------ ------------
file1.ext file1.ext file1.ext
file2.ext file2.ext file2.ext
readme.md readme.md readme.md
The HEAD
copy is in a commit and cannot be changed, so that's that. The working tree copy is yours to play with as you wish. The weird thing is this extra copy sitting between the HEAD and working-tree versions.
You can't see this extra version, at least not easily.2 But it's there. Why? Well, one answer to that is: "for no reason"âafter all, those other version control systems don't do this.3 But Git does do it, and Git has a reason other than just "to be different". In particular, the existence of this "staging area" copy allows you to use git add -p
to partially add a file. There's a whole set of these partial operations (git reset -p
, git checkout -p
, etc.), and I personally am not a huge fan of these, but they do exist, and are often used as justification for the staging area's existence.
The data stored in the staging area are secretly4 in the same read-only, compressed, and de-duplicated form that Git uses inside commits. What this does for Git itself is make git commit
go very fast (compared to all those other VCSes, anyway). When you run git commit
, the copies of files that are in the staging area are ready to be committed. There's almost no extra work required.5 When you run git checkout
, Git pre-fills the index/staging-areaâthese are two terms for the same thing, in Gitâwith all the files from the commit you checked out. As you run git add
, Git:
so that the file is ready to go and Git can just update its index entry / staging copy with the new-or-reused internal hash ID.6 That means that the index/staging-area is now ready to be committed.
If we put this together, we see the following:
full-commit-checkout (or git switch
): fills in the index / staging-area and your working tree:
HEAD staging working tree
---- ---------- -------------
file1.ext -> file1.ext -> file1.ext
file2.ext -> file2.ext -> file2.ext
readme.md -> readme.md -> readme.md
git add
: copies from working tree to index / staging-area. Let's say we just git add file2.ext
:
HEAD staging working tree
---- ---------- -------------
file1.ext file1.ext file1.ext
file2.ext file2.ext <- file2.ext
readme.md readme.md readme.md
Now various other operations start to make sense too:
git rm --cached
: removes the index copy, leaving everything else untouched. Let's say we git rm --cached file2.ext
:
HEAD staging working tree
---- ---------- -------------
file1.ext file1.ext file1.ext
file2.ext file2.ext
readme.md readme.md readme.md
git reset
, in one of its modes: restores the staged copy (only) from the HEAD copy, with git reset -- file2.ext
for instance:
HEAD staging working tree
---- ---------- -------------
file1.ext file1.ext file1.ext
file2.ext -> file2.ext file2.ext
readme.md readme.md readme.md
git rm
without --cached
: removes both index and working tree copies:
HEAD staging working tree
---- ---------- -------------
file1.ext file1.ext file1.ext
file2.ext
readme.md readme.md readme.md
git restore
: restores staging, working-tree, or both, depending on flags; let's say we now run git restore -SW file2.ext
to restore both:
HEAD staging working tree
---- ---------- -------------
file1.ext file1.ext file1.ext
file2.ext -> file2.ext -> file2.ext
readme.md readme.md readme.md
The git checkout
command has modes that emulate two of the things that git restore
can do: it can either copy from staging to working tree, or from HEAD
to both staging and working-tree.7 This is kind of dangerous since these operations overwrite the working tree copy even if you never saved it anywhere. That makes using git switch
instead of git checkout
"safer", since you won't get this destructive mode of operation by accident.8
Hence, the short answer (too late) is that your second git add
overwrote the staging copy that your first git add
wrote, throwing away that earlier staging copy. It's now very hard to get back.
1Technically, as long as the file is stored as what Git calls a loose object, it's not that hard to read: open the underlying object with any zlib decompression program and decompress it, then discard the header that Git added. But just finding the object alone is a pain in the keister, and then it could be the opposite of "loose", which is not "tight" but rather packed, and then you're really in trouble.
Overwriting the file is physically possible, but because the object's name is a cryptographic checksum of the object's data, overwriting the file simply damages the data to the point where Git will say "this object is corrupt" and refuse to extract it at all. You'll know that the repository is damaged and that you should find some other clone that is undamaged.
2To see it not-easily, run git ls-files --stage
; be aware that this dumps out of lot of output in a big repository.
3Mercurial, for instance, literally doesn't, but does have a hidden thing called the "dirstate" that does some of what Git's index does. The user-oriented difference between Mercurial's dirstate and Git's index, though, is that you don't even have to know that the dirstate exists. Git shoves its index / staging-area in your face now and then: Look! I have this extra copy! Isn't it cool? Look, see! and you really have to be aware of it.
4Using git ls-files --stage
exposes this "secret", so it's not really all that secret. But you don't need to know this unless you start using git ls-files --stage
yourself, and/or couple that with use of git update-index
.
5The one bit of extra work required is that Git has to run the internal equivalent of git write-tree
. This saves the file's names and modes. The dataâthe file's contentâare already "pre-saved", as Romain Valeri noted.
6Exercise for the reader: what if you git add
some content and then never commit it, e.g., by overwriting it with new content? There's an internal Git object that never seems to get used here. Look at the git gc
documentation to see what eventually happens.
7The git restore
command has the ability to copy from HEAD
to working-tree, skipping right over the staging copy, if you like; git checkout
can't do this. Whether you ever want to do this, I don't know: that will be up to you. But if you decide that you do want to do this, remember that git restore
is more capable than this other mode of git checkout
.
8The "destructive mode by accident" thing no longer happens in Git 2.23 and later, which now notes that your git checkout zorg
request was ambiguous. Here's where it does happen in older versions of Git:
origin/zorg
.zorg
, and you've run git checkout develop
and gotten the develop
-branch-tip copy of the file.zorg
branch. So you run git checkout zorg
.You don't have a zorg
branch, but you do have an origin/zorg
, and you were expecting Git to create a new branch zorg
from origin/zorg
and switch to it, if that was safe, or give you an error reminding you to stash or commit your files. But instead, Git says: oh, you want me to erase your last hour's work on the file zorg
and hence extracts the staging copy of file zorg
to your working tree.
Had you used git switch zorg
, Git would have known that you meant to create a new branch, and would have tried that safely. But instead, Git destroyed your work. Bummer! Just don't go kill a bunch of people (or even Mangalores) to vent your frustration, OK? đ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With