Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why staging directory is also called Index/Git Index?

Tags:

git

git-index

I was confused the naming of staging directory (Git Index) in Git.

Is there any special meaning such that it is called Index? Why not just called Cache / or Temp directory so that we can understand more easily?

To me, index is sth which help us to search things faster, like indexing in DBMS, how does it relate to the staging area???

I did some google search but still have no much idea. ref link Git Index

like image 378
TheOneTeam Avatar asked Jul 16 '11 08:07

TheOneTeam


People also ask

Is Index same as staging area in git?

The Staging area is a conceptual view for the user, while Index is more of a Git developer viewpoint (where they keep the lists of what is in the 'staging area').

What is staging index in git?

The Git index is a staging area between the working directory and repository. It is used to build up a set of changes that you want to commit together. To better understand the Git index, then first understand the working directory and repository.

Which command shows the difference between the working directory and the index or staging area?

The git status command will show you the different states of files in your working directory and staging area.

What does Index added mean in git?

If for any file git calculates SHA-1 sum then basically adding to index means that it calculates SHA-1 sum and add file to the staging area.


1 Answers

The article by the main Git maintainer Junio C. Hamano, is instructive, for grasping the difference between cache and index:
(emphasis mine)

When Linus started writing git, his aim was to allow him to reproduce each and every intermediate state produced by his original "tarball and patches" workflow he used before the BitKeeper days.
Starting from a 2.6.12 tarball, he queues patch-1, patch-2,... so 2.6.12 itself, 2.6.12 with patch-1 applied, 2.6.12 with both patch-1 and patch-2 applied, become three versions.

But this won't obviously scale if you have to shuffle hundreds of patches a day. So he invented "directory cache"; as a concept, this roughly corresponds to "tree" objects in today's git: a collection of records, each of which is a compact representation of what a whole directory structure contains.
The way to build it was to "add the contents to the cache, or update the contents in the cache".

The control directory to host the collection of such version control records was named ".dircache" (this was renamed to ".git" after some time).
There was a file called ".dircache/index", and the contents of this file was read and manipulated in a set of variables in C that were named after a noun, "cache".
Back then, the concept of what we today call the index, a buffer area to build up the collection of contents you intend to write out as a tree object, was called "cache".
Everybody talked about "cache" and "index" interchangeably, as the file that records what is in the "cache" was named "index". It was (and it still is) an index to allow you to find the contents in the cache by giving it a pathname.

As more and more people started using git without having to read its code at all, the use of the word "index" has become more prevalent for obvious reasons.
As something that is on the filesystem, it is much more visible than the variable name in the C source code.
Eventually, we stopped using "cache" as a noun to name what we call "the index" today when explaining the use of git as the end-user.
The word "cache" however is still used as a noun when we want to talk about the internal data structure in the context of discussing git implementation (e.g. "Let's make it possible for programs to work with more than one cache at the same time").

At the end user level, "cache" is only used as an adjective these days; "cached", meaning "contents cached in the index, not the contents in the work tree".
We could have called it "indexed", but "cached contents" was an already established phrase from very early days to mean that exact concept, and we did not need another word that meant the same thing.

[...] In the earlier days, there was a distinction between "adding a new file to the index" and "updating a file that is already in the index with new contents".
[...] Modern (and medieval) versions of git uses "git add" for both. We could have been just honest and called the act of updating-or-adding-to-the-index "add", but some people in "git training" industry started teaching the index as "the staging area for the next commit", and as an inevitable consequence, a verb "to stage" started to appear in many documentation to mean "the act of adding contents to the index".
I sometimes use this verb myself, but that is only when I suspect that the audience might have learned git first from these new people. Strictly speaking this is a redundant and fairly recent word in git vocabulary.

like image 165
VonC Avatar answered Sep 28 '22 03:09

VonC