Simply put, the git add command is like a photographer positioning all people for a group photo while the git commit is like the photographer actually snapping the picture. The staging area allows us to preview changes before they are finalized.
Staging is a step before the commit process in git. That is, a commit in git is performed in two steps: staging and actual commit. As long as a changeset is in the staging area, git allows you to edit it as you like (replace staged files with other versions of staged files, remove changes from staging, etc.).
Stage lets you push files on GIT on which you have finished working. It collects all the changes to be committed collectively before they will be committed. Commit then perform push on these Staged changes and leave other files.
In the staging area, you collect the changes that make up the next commit. So if you modify one file and add its changes to the index, that one file is staged, while all other already commited files are also tracked files.
When you commit it's only going to commit the changes in the index (the "staged" files). There are many uses for this, but the most obvious is to break up your working changes into smaller, self-contained pieces. Perhaps you fixed a bug while you were implementing a feature. You can git add
just that file (or git add -p
to add just part of a file!) and then commit that bugfix before committing everything else. If you are using git commit -a
then you are just forcing an add
of everything right before the commit. Don't use -a
if you want to take advantage of staging files.
You can also treat the staged files as an intermediate working copy with the --cached
to many commands. For example, git diff --cached
will show you how the stage differs from HEAD
so you can see what you're about to commit without mixing in your other working changes.
git diff --staged
and check which files you changed and where and start making other changes.One practical purpose of staging is logical separation of file commits.
As staging allows you to continue making edits to the files/working directory, and make commits in parts when you think things are ready, you can use separate stages for logically unrelated edits.
Suppose you have 4 files fileA.html
, fileB.html
, fileC.html
and fileD.html
. You make changes to all 4 files and are ready to commit but changes in fileA.html
and fileB.html
are logically related (for example, same new feature implementation in both files) while changes in fileC.html
and fileD.html
are separate and logically unrelated to previous to files. You can first stage files fileA.html
and fileB.html
and commit those.
git add fileA.html
git add fileB.html
git commit -m "Implemented new feature XYZ"
Then in next step you stage and commit changes to remaining two files.
git add fileC.html
git add fileD.html
git commit -m "Implemented another feature EFG"
To expand on Ben Jackson's answer, which is fine, let's look at the original question closely. (See his answer for why bother type questions; this is more about what is going on.)
I'm new to version control and I understand that "committing" is essentially creating a backup while updating the new 'current' version of what you're working on.
This isn't quite right. Backups and and version control are certainly related—exactly how strongly depends on some things that are to some extent matters of opinion—but there are certainly some differences, if only in intent: Backups are typically designed for disaster recovery (machine fails, fire destroys entire building including all storage media, etc.). Version control is typically designed for finer-grained interactions and offers features that backups don't. Backups are typically stored for some time, then jettisoned as "too old": a fresher backup is all that matters. Version control normally saves every committed version forever.
What I don't understand is what staging for is from a practical perspective. Is staging something that exists in name only or does it serve a purpose? When you commit, its going to commit everything anyway, right?
Yes and no. Git's design here is somewhat peculiar. There exist version control systems that don't require a separate staging step. For instance, Mercurial, which is otherwise a lot like Git in terms of usage, doesn't require a separate hg add
step, beyond the very first one that introduces an all-new file. With Mercurial, you use the hg
command that selects some commit, then you do your work, then you run hg commit
, and you're done. With Git, you use git checkout
,1 then you do your work, then you run git add
, and then git commit
. Why the extra git add
step?
The secret here is what Git calls, variously, the index, or the staging area, or sometimes—rarely these days—the cache. These are all names for the same thing.
Edit: I think I may be confusing the terminology. Is a 'staged' file the same thing as a 'tracked' file?
No, but these are related. A tracked file is one that exists in Git's index. To properly understand the index, it's good to start with understanding commits.
1Since Git version 2.23, you can use git switch
instead of git checkout
. For this particular case, these two commands do exactly the same thing. The new command exists because git checkout
got over-stuffed with too many things; they got split out into two separate commands, git switch
and git restore
, to make it easier and safer to use Git.
In Git, a commit saves a full snapshot of every file that Git knows about. (Which files does Git know about? We'll see that in the next section.) These snapshots are stored in a special, read-only, Git-only, compressed and de-duplicated form, that in general only Git itself can read. (There's more stuff in each commit than just this snapshot, but that's all we will cover here.)
The de-duplication helps with space: we normally only change a few files, then make a new commit. So most of the files in a commit are mostly the same as the files in the previous commit. By simply re-using those files directly, Git saves lots of space: if we only touched one file, the new commit only takes space for one new copy. Even then it's compressed—sometimes very compressed, though this actually happens later—so that a .git
directory can actually be smaller than the files it contains, once they're expanded out to normal everyday files. The de-duplication is safe because the committed files are frozen for all time. Nobody can go change one, so it's safe for commits to depend on each others' copies.
Because the stored files are in this special, frozen-for-all-time, Git-only format, though, Git has to expand out each file into an ordinary everyday copy. This ordinary copy isn't Git's copy: it is your copy, to do with as you will. Git will just write to these when you tell it to do so, so that you have your copies to work with. These usable copies are in your working tree or work-tree.
What this means is that when you check out some particular commit, there are automatically two copies of each file:
Git has a frozen-for-all-time, Git-ified copy in the current commit. You can't change this copy (though you can of course select a different commit, or make a new commit).
You have, in your work-tree, a normal-format copy. You can do anything you want to this, using any of the commands on your computer.
Other version control systems (including Mercurial as mentioned above) stop here, with these two copies. You just modify your work-tree copy, then commit. Git ... doesn't.
In between these two copies, Git stores a third copy2 of every file. This third copy is in the frozen format, but unlike the frozen copy in the commit, you can change it. To change it, you use git add
.
The git add
command means make the index copy of the file match the work-tree copy. That is, you are telling Git: Replace the frozen-format, de-duplicated copy that's in the index now, by compressing my updated work-tree copy, de-duplicating it, and getting it ready to be frozen into a new commit. If you don't use git add
, the index still holds the frozen-format copy from the current commit.
When you run git commit
, Git packages up whatever is in the index right then to use as the new snapshot. Since it's already in the frozen format, and pre-de-duplicated, Git does not have to do a lot of extra work.
This also explains what untracked files are all about. An untracked file is a file that is in your work-tree but isn't in Git's index right now. It doesn't matter how it the file wound up in this state. Maybe you copied it from some other place on your computer, into your work-tree. Maybe you created it fresh here. Maybe there was a copy in Git's index, but you removed that copy with git rm --cached
. One way or another, there is a copy here in your work-tree, but there isn't a copy in Git's index. If you make a new commit now, that file won't be in the new commit.
Note that git checkout
initially fills in Git's index from the commit you check out. So the index starts out matching the commit. Git also fills in your work-tree from this same source. So, initially, all three match. When you change files in your work-tree and git add
them, well, now the index and your work-tree match. Then you run git commit
and Git makes a new commit from the index, and now all three match again.
Because Git makes new commits from the index, we can put things this way: Git's index holds the next commit you plan to make. This ignores the expanded role that Git's index takes on during a conflicted merge, but we'd like to ignore that for now anyway. :-)
That's all there is to it—but it's still pretty complicated! It's particularly tricky because there's no easy way to see exactly what is in Git's index.3 But there is a Git command that tells you what's going on, in a way that's pretty useful, and that command is git status
.
2Technically, this isn't actually a copy at all. Instead, it's a reference to the Git-ified file, pre-de-duplicated and everything. There's more stuff in here as well, such as the mode, file name, a staging number, and some cache data to make Git go fast. But unless you get into working with some of Git's low-level commands—git ls-files --stage
and git update-index
in particular—you can just think of it as a copy.
3The git ls-files --stage
command will show you the names and staging numbers of every file in Git's index, but usually this isn't very useful anyway.
git status
The git status
command actually works by running two separate git diff
commands for you (and also doing some other useful stuff, such as telling you which branch you're on).
The first git diff
compares the current commit—which, remember, is frozen for all time—to whatever is in Git's index. For files that are the same, Git will say nothing at all. For files that are different, Git will tell you that this file is staged for commit. This includes all-new files—if the commit doesn't have sub.py
in it, but the index does have sub.py
in it, then this file is added—and any removed files, that were (and are) in the commit but aren't in the index any more (git rm
, perhaps).
The second git diff
compares all the files in Git's index to the files in your work-tree. For files that are the same, Git says nothing at all. For files that are different, Git will tell you that this file is not staged for commit. Unlike the first diff, this particular list doesn't include files that are all-new: if the file untracked
exists in your work-tree, but not in Git's index, Git just adds it to the list of untracked files.4
At the end, having accumulated these untracked files in a list, git status
will announce those files' names too, but there's a special exception: if a file's name is listed in a .gitignore
file, that suppresses this last listing. Note that listing a tracked file—one that's in Git's index—in a .gitignore
has no effect here: the file is in the index, so it gets compared, and gets committed, even if it's listed in .gitignore
. The ignore file only suppresses the "untracked file" complaints.5
4When using the short version of git status
—git status -s
—the untracked files aren't as separated-out, but the principle is the same. Accumulating the files like this also lets git status
summarize a bunch of untracked files' names by just printing a directory name, sometimes. To get the full list, use git status -uall
or git status -u
.
5Listing a file also makes en-masse add many file operations like git add .
or git add *
skip over the untracked file. This part gets a little more complicated, since you can use git add --force
to add a file that would normally be skipped. There are some other normally-minor special cases, all of which add up to this: the file .gitignore
might be more properly called .git-do-not-complain-about-these-untracked-files-and-do-not-auto-add-them
or something equally unwieldy. But that's too ridiculous, so .gitignore
it is.
git add -u
, git commit -a
, etcThere are several handy shortcuts to know about here:
git add .
will add all updated files in the current directory and any sub-directory. This respects .gitignore
, so if a file that is currently untracked is not complained-about by git status
, it won't be auto-added.
git add -u
will auto-add all updated files anywhere in your work-tree.6 This affects only tracked files. Note that if you've removed the work-tree copy, this will remove the index copy too (git add
does this as part of its make the index match the work-tree thing).
git add -A
is like running git add .
from the top level of your work-tree (but see footnote 6).
Besides these, you can run git commit -a
, which is roughly equivalent7 to running git add -u
and then git commit
. That is, this gets you the same behavior that is convenient in Mercurial.
I generally advise against the git commit -a
pattern: I find that it's better to use git status
often, look closely at the output, and if the status is not what you expected, figure out why that's the case. Using git commit -a
, it's too easy to accidentally modify a file and commit a change you didn't intend to commit. But this is mostly a matter of taste / opinion.
6If your Git version predates Git 2.0, be careful here: git add -u
only works on the current directory and sub-directories, so you must climb to the top level of your work-tree first. The git add -A
option has a similar issue.
7I say roughly equivalent because git commit -a
actually works by making an extra index, and using that other index to do the commit. If the commit works, you get the same effect as doing git add -u && git commit
. If the commit doesn't work—if you make Git skip the commit in any of the many ways you can do that—then no files are git add
-ed afterward, because Git throws out the temporary extra index and goes back to using the main index.
There are additional complications that come in if you use git commit --only
here. In this case, Git creates a third index, and things get very tricky, especially if you use pre-commit hooks. This is another reason to use separate git add
operations.
It is easier to understand the use of the git commands add
and commit
if you imagine a log file being maintained in your repository on Github.
A typical project's log file for me may look like:
---------------- Day 1 --------------------
Message: Complete Task A
Index of files changed: File1, File2
Message: Complete Task B
Index of files changed: File2, File3
-------------------------------------------
---------------- Day 2 --------------------
Message: Correct typos
Index of files changed: File3, File1
-------------------------------------------
...
...
...and so on
I usually start my day with a git pull
request and end it with a git push
request. So everything inside a day's record corresponds to what occurs between them. During each day, there are one or more logical tasks that I complete which require changing a few files. The files edited during that task are listed in an index.
Each of these sub tasks(Task A and Task B here) are individual commits. The git add
command adds files to the 'Index of Files Changed' list. This process is also called staging. The git commit
command records/finalizes the changes and the corresponding index list along with a custom message.
Remember that you're still only changing the local copy of your repository and not the one on Github. After this, only when you do a 'git push' do all these recorded changes, along with your index files for each commit, get logged on the main repository(on Github).
As an example, to obtain the second entry in that imaginary log file, I would have done:
git pull
# Make changes to these files
git add File3 File4
# Verify changes, run tests etc..
git commit -m 'Correct typos'
git push
In a nutshell, git add
and git commit
lets you break down a change to the main repository into systematic logical sub-changes. As other answers and comments have pointed out, there are ofcourse many more uses to them. However, this is one of the most common usages and a driving principle behind Git being a multi-stage revision control system unlike other popular ones like Svn.
Staging area helps us craft the commits with greater flexibility. By crafting, I mean breaking up the commits into logical units. This is very crucial if you want a maintainable software. The most obvious way you can achieve this:
You can work on multiple features/bugs in a single working directory and still craft meaningful commits. Having a single working directory which contains all of our active work is also very convenient. (This can be done without a staging area, only as long as the changes don't ever overlap a file. And you also have the added responsibility of manually tracking whether they overlap)
You can find more examples here: Uses of Index
And the best part is, the advantages do not stop with this list of workflows. If a unique workflow does come up, you can be almost sure that staging area will help you out.
I see the point on using stage to make commits smaller as mentioned by @Ben Jackson and @Tapashee Tabassum Urmi and sometimes I use it for that purpose, but I mainly use it to make my commits larger! here is my point:
Say I want to add a small feature which require several smaller steps. I don't see any point in having a separate commit for smaller steps and flooding my timeline. However I want to save each step and go back if necessary,
I simply stage the smaller steps on top of each other and when I feel it is worthy of a commit, I commit. This way I remove the unnecessary commits from the timeline yet able to undo(checkout) the last step.
I see other ways for doing this (simplifying the git history) which you might use depending on your preference:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With