Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens to previous "staged but not commited" changes?

Tags:

git

github

I'm new to git and just started learning from this site: https://dev.to/unseenwizzard/learn-git-concepts-not-commands-4gjc

My doubt is in this part of the document:

Have a look at Alice.txt.
It actually contains some text, but Bob.txt doesn't, so lets change that and put Hi!! I'm Bob. I'm new here. in there.

To summarize, we make a change to Bob.txt with this sentence:- Hi!! I'm Bob. I'm new here.
The we staged this change but did not commit it.
Later on, we went ahead and made another change to the same sentence by removing the extra ! after Hi.

So now the sentence looks like this:- Hi! I'm Bob. I'm new here.
Now we staged and committed this new change.

My doubt is: Is the previous modification to Bob.txt (with the two !!) lost from the staging area? I ran git status but it doesn't mention the previous change. Can I go back and commit the change with the two "!!" ?

like image 573
seffersonjtarship Avatar asked Dec 17 '22 11:12

seffersonjtarship


1 Answers

VonC's answer and Romain Valeri's addition to that answer are both correct, but might be hard to visualize or understand in various ways. Here's a way to understand it that might be at least somewhat intuitive.

When you're working with a commit in Git, there are up to three copies of each file. Once you know that a commit contains a full snapshot of every file, in a read-only (and Git-only, compressed, and de-duplicated) format, you will see why it's necessary to have two copies:

Git's read-only copy in HEAD        your working tree copy
----------------------------        ----------------------
file1.ext                           file1.ext
file2.ext                           file2.ext
readme.md                           readme.md

Even if the contents of file1.ext in the commit match the content of file1.ext in your working tree—your working tree being where you get to see and work on/with the files that Git extracted from the commit—the Git copy is in some special, weird, compressed, and read-only format. Only Git itself can even read this file, and nothing—not even Git itself—can overwrite it.1 Your working tree contains ordinary everyday files, that everyone can read and write in the ordinary way, so Git really does have to copy it out, every time you check out the commit.

This same principle holds for other version control systems too: in Mercurial, or SVN, or CVS or ClearCase or whatever, you'll often find multiple copies of files (the precise details depend greatly on the VCS). What's particularly weird about Git, though, is that instead of just two copies of each file, Git provides three:

HEAD (r/o)    staging area    working tree
----------    ------------    ------------
file1.ext     file1.ext       file1.ext
file2.ext     file2.ext       file2.ext
readme.md     readme.md       readme.md

The HEAD copy is in a commit and cannot be changed, so that's that. The working tree copy is yours to play with as you wish. The weird thing is this extra copy sitting between the HEAD and working-tree versions.

You can't see this extra version, at least not easily.2 But it's there. Why? Well, one answer to that is: "for no reason"—after all, those other version control systems don't do this.3 But Git does do it, and Git has a reason other than just "to be different". In particular, the existence of this "staging area" copy allows you to use git add -p to partially add a file. There's a whole set of these partial operations (git reset -p, git checkout -p, etc.), and I personally am not a huge fan of these, but they do exist, and are often used as justification for the staging area's existence.

The data stored in the staging area are secretly4 in the same read-only, compressed, and de-duplicated form that Git uses inside commits. What this does for Git itself is make git commit go very fast (compared to all those other VCSes, anyway). When you run git commit, the copies of files that are in the staging area are ready to be committed. There's almost no extra work required.5 When you run git checkout, Git pre-fills the index/staging-area—these are two terms for the same thing, in Git—with all the files from the commit you checked out. As you run git add, Git:

  • compresses and hashes the working tree file's content;
  • checks to see if that's a duplicate; and
  • if it's a duplicate, re-uses the old one, otherwise saves the new one

so that the file is ready to go and Git can just update its index entry / staging copy with the new-or-reused internal hash ID.6 That means that the index/staging-area is now ready to be committed.

If we put this together, we see the following:

  • full-commit-checkout (or git switch): fills in the index / staging-area and your working tree:

    HEAD         staging      working tree
    ----         ----------   -------------
    file1.ext -> file1.ext -> file1.ext
    file2.ext -> file2.ext -> file2.ext
    readme.md -> readme.md -> readme.md
    
  • git add: copies from working tree to index / staging-area. Let's say we just git add file2.ext:

    HEAD         staging      working tree
    ----         ----------   -------------
    file1.ext    file1.ext    file1.ext
    file2.ext    file2.ext <- file2.ext
    readme.md    readme.md    readme.md
    

Now various other operations start to make sense too:

  • git rm --cached: removes the index copy, leaving everything else untouched. Let's say we git rm --cached file2.ext:

    HEAD         staging      working tree
    ----         ----------   -------------
    file1.ext    file1.ext    file1.ext
    file2.ext                 file2.ext
    readme.md    readme.md    readme.md
    
  • git reset, in one of its modes: restores the staged copy (only) from the HEAD copy, with git reset -- file2.ext for instance:

    HEAD         staging      working tree
    ----         ----------   -------------
    file1.ext    file1.ext    file1.ext
    file2.ext -> file2.ext    file2.ext
    readme.md    readme.md    readme.md
    
  • git rm without --cached: removes both index and working tree copies:

    HEAD         staging      working tree
    ----         ----------   -------------
    file1.ext    file1.ext    file1.ext
    file2.ext
    readme.md    readme.md    readme.md
    
  • git restore: restores staging, working-tree, or both, depending on flags; let's say we now run git restore -SW file2.ext to restore both:

    HEAD         staging      working tree
    ----         ----------   -------------
    file1.ext    file1.ext    file1.ext
    file2.ext -> file2.ext -> file2.ext
    readme.md    readme.md    readme.md
    

The git checkout command has modes that emulate two of the things that git restore can do: it can either copy from staging to working tree, or from HEAD to both staging and working-tree.7 This is kind of dangerous since these operations overwrite the working tree copy even if you never saved it anywhere. That makes using git switch instead of git checkout "safer", since you won't get this destructive mode of operation by accident.8

Hence, the short answer (too late) is that your second git add overwrote the staging copy that your first git add wrote, throwing away that earlier staging copy. It's now very hard to get back.


1Technically, as long as the file is stored as what Git calls a loose object, it's not that hard to read: open the underlying object with any zlib decompression program and decompress it, then discard the header that Git added. But just finding the object alone is a pain in the keister, and then it could be the opposite of "loose", which is not "tight" but rather packed, and then you're really in trouble.

Overwriting the file is physically possible, but because the object's name is a cryptographic checksum of the object's data, overwriting the file simply damages the data to the point where Git will say "this object is corrupt" and refuse to extract it at all. You'll know that the repository is damaged and that you should find some other clone that is undamaged.

2To see it not-easily, run git ls-files --stage; be aware that this dumps out of lot of output in a big repository.

3Mercurial, for instance, literally doesn't, but does have a hidden thing called the "dirstate" that does some of what Git's index does. The user-oriented difference between Mercurial's dirstate and Git's index, though, is that you don't even have to know that the dirstate exists. Git shoves its index / staging-area in your face now and then: Look! I have this extra copy! Isn't it cool? Look, see! and you really have to be aware of it.

4Using git ls-files --stage exposes this "secret", so it's not really all that secret. But you don't need to know this unless you start using git ls-files --stage yourself, and/or couple that with use of git update-index.

5The one bit of extra work required is that Git has to run the internal equivalent of git write-tree. This saves the file's names and modes. The data—the file's content—are already "pre-saved", as Romain Valeri noted.

6Exercise for the reader: what if you git add some content and then never commit it, e.g., by overwriting it with new content? There's an internal Git object that never seems to get used here. Look at the git gc documentation to see what eventually happens.

7The git restore command has the ability to copy from HEAD to working-tree, skipping right over the staging copy, if you like; git checkout can't do this. Whether you ever want to do this, I don't know: that will be up to you. But if you decide that you do want to do this, remember that git restore is more capable than this other mode of git checkout.

8The "destructive mode by accident" thing no longer happens in Git 2.23 and later, which now notes that your git checkout zorg request was ambiguous. Here's where it does happen in older versions of Git:

  • Suppose you have a branch named origin/zorg.
  • Suppose you also have a file named zorg, and you've run git checkout develop and gotten the develop-branch-tip copy of the file.
  • Suppose now that you spent the last hour working on your evil plan to fire cab drivers.
  • Now you take a coffee break (and carefully don't choke on a cherry). When you come back you think: Wait, I wanted to be on the zorg branch. So you run git checkout zorg.

You don't have a zorg branch, but you do have an origin/zorg, and you were expecting Git to create a new branch zorg from origin/zorg and switch to it, if that was safe, or give you an error reminding you to stash or commit your files. But instead, Git says: oh, you want me to erase your last hour's work on the file zorg and hence extracts the staging copy of file zorg to your working tree.

Had you used git switch zorg, Git would have known that you meant to create a new branch, and would have tried that safely. But instead, Git destroyed your work. Bummer! Just don't go kill a bunch of people (or even Mangalores) to vent your frustration, OK? 😀

like image 171
torek Avatar answered Jan 14 '23 11:01

torek