What does "git ls-files" do exactly and how do we remove a file from it?

Tags:

git

Does it show files from the local repository, the staging repository, the remote repository or from somewhere else?

I'm constantly seeing a file that is present in "git ls-files". That file was deleted from the remote repository. After which I tried doing a git pull. However, that file still shows up in this command list. It should not be present here because it's not present in the remote repository either.

415

asked May 21 '19 09:05

Mugen

1 Answers

Summary

You need to wrap your head around the idea that Git stores at least three, and sometimes up to five active copies of each file: one in the current commit, one (or two or three!) in the index, and one—the only one you can see and work with—in your work-tree. The git ls-files command looks at these copies, then tells you something about some of them, depending on the flags you supply to git ls-files.

Without this idea of three-to-five copies of each file, lots of things in Git will never make any sense. (Well, some things are still tricky even with it, but that's another problem entirely. 😀)

Long

I think there are two issues here. One requires some terminology and then the other should fall into place:

Does [git ls-files] show files from the local repository,

Sort of, but:

the staging repository,

Git does not have a staging repository. Each repository has something that is called, in different Git documentation, either the index or the staging area. (There's an obsoleted third name, cache, that also appears in the Git glossary.)

the remote repository

Definitely not: there need not be any remote repositories—i.e., other Gits with their own repositories—at all, and if there are, only git fetch and git push have your Git call up their Git and exchange data with them. (Well, git ls-remote does the first little bit of git fetch, and git pull runs git fetch, so these two also exchange data with a remote. But git ls-files doesn't.)

or from somewhere else?

Yes, sort of. That gets us back to the first part. So let's take these three bits of terminology as defined in the Git glossary. Italic (including bold italic) text in below is directly from the linked documentation:

repository

A collection of refs together with an object database containing all objects which are reachable from the refs, possibly accompanied by meta data from one or more porcelains. A repository can share an object database with other repositories via alternates mechanism. (all links theirs)

This of course is full of yet more terminology. To attempt to de-mystify it a bit, what they're saying here is that the repository proper doesn't include the index and work-tree: it's mostly made up of the commits (and their contents). Of course, that requires that we define "index" and "work-tree", so let's move on to:
index

A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. Truth be told, it can also contain a second, and even a third version of a working tree, which are used when merging.
working tree (I usually call this work-tree):

The tree of actual checked out files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.

Commits are frozen forever

When you run git commit, Git makes a snapshot of all of your files—well, all of your tracked files, anyway—and stores that, plus some metadata like your name and email address, in a commit. This commit is mostly permanent—you can get rid of commits, usually with a fair bit of difficulty, but just think of them as permanent for convenience—and is totally, completely, 100% read-only. It's read-only like this on purpose, because that allows other commits to share identical copies of files, so that if you commit the same file once, ten times, or even a million times, there's really only one copy of that file in the repository. It's only when you change the file to a new version that Git has to commit a new, separate copy.

The commits are numbered, but not by a nice easy sequential numbering system. That is, we might draw them as a series of simple numbered or lettered things:

... <-C4 <-C5 <-C6 ...

where each later commit points back to its immediate predecessor. But their actual names are big ugly hash IDs. Each one is guaranteed to be unique, which is why they have to be so big and ugly and random-looking. Each hash ID is actually a cryptographic checksum, calculated over the commit's contents, so that every Git everywhere in the universe will agree that that commit, and only that commit, gets that checksum. That's the other reason you—and even Git—can't change it: if you take a commit out of the repository database, tinker with it, and change even one single bit and then put it back into the database, what you get is a new commit with a new and different hash ID.

So commits are totally frozen, forever. The files inside them are frozen forever as well, and compressed, and in a special Git-only format. I like to call these files "freeze-dried". What this means is that, hey, they're great for archiving, but they are utterly useless for getting any new work done ... and that means that Git must provide some way of taking these freeze-dried files and rehydrating them into a useful form.

The work-tree provides the useful-form copies

Things don't really get much simpler than this: the work-tree has the useful-form, rehydrated copies of your files. Because they're just ordinary everyday files on your computer, you can see them, use them, change them around however you like, and otherwise work with them. They're technically not in the repository at all—they are more just right next to it. In a typical setup, the repository itself is in the .git directory/folder of the top level of your work-tree.

Obviously, if there's a commit you've extracted to make the work-tree, there must now be two copies of each file: the freeze-dried committed one, plus the regular working one. Git could stop here. Mercurial does stop here: if you use Mercurial instead of Git, you don't need to concern yourself with a third copy, because there is no third copy. But Git goes on to store yet more copies of the files.

The index / staging-area sits between the commit and the work-tree

What Git does here is to interpose a third copy of each file, between the freeze-dried committed copy and the work-tree copy. This third copy is in the committed-file format—i.e., pre-dehydrated–but by not being in a commit, it's not actually totally frozen: it can be replaced at any time. That's what git add does: git add takes the ordinary copy of the file from the work-tree, compresses it down into the freeze-dried format, and replaces the copy that's in the index. Or, if the file wasn't in the index at all, it puts a copy into the index.

This is why you have to git add files all the time. In Mercurial, you only hg add a file once. After that, you just run hg commit, and Mercurial looks at all the files it knows about, and freezes them into a new commit. This can take a long time, in a big repository. Git, by contrast, already has all the files it's supposed to know about, and already dehydrated, in the index, so git commit can just package up those dehydrated files into a new frozen commit. The cost of this speed is git add, but if you get into playing clever tricks with the index copies—e.g., using git add -p—you get more benefits than just the speedup.

As the Git glossary mentioned in its description of the index, the index takes on an expanded role during a conflicted merge. When you do a merge operation—whether that's from git merge, or from git revert or git cherry-pick or any other Git command that uses the merge engine—and it doesn't go smoothly, Git winds up putting all three inputs for each file into the index, so that instead of just one copy of file.ext, you get three. But as long as you're not in the middle of a merge, there's only one copy in the index.

Usually the index copy matches the HEAD frozen copy, or matches the work-tree copy, or both. For instance, after a fresh git checkout, all three copies match. Then you modify file.ext in the work-tree: now the commit and the index match, but they're not the same as the work-tree copy. Then you git add file.ext, and now the index and work-tree match, but they're different from the frozen copy. Then you git commit to make a new commit, which becomes the current commit, and all three copies match again.

Note that you can modify the work-tree copy:

vim file.ext

then copy the updated one into the index:

git add file.ext

then edit it again:

vim file.ext

and that way, you can make all three copies different. If you do that, git status will say that you have changes staged for commit, because the index copy is different from the current-commit copy, and say that you have changes not staged for commit, because the work-tree copy is different from the index copy.

The work-tree can contain files that aren't in the index at all

The index is initially just a copy of the current commit. Git then also copies those files to the work-tree, so that you can use them. But you can create files in the work-tree and not run git add on them. Those files aren't in the index now, and if you run git commit, they won't be in the new commit either, because Git builds the new commit from the index.

You can also remove files from the index, without removing them from the work-tree:

git rm --cached file.ext

removes the index copy. It can't touch the current commit frozen copy, of course, but if you now make a new commit, the new commit won't have file.ext in it at all. (The previous commit still does, of course.)

Any file that is in your work-tree right now, and is not in your index right now, is an untracked file. Its untracked-ness comes from the fact that it's not in your index. Put that file into your index and it's tracked, no matter how you got it into your index. Remove it from your index and it's untracked, no matter how you got it out of your index. So that's the last role of the index: to determine which files are tracked, and will therefore be in the next commit.

Now we can see clearly what `git ls-files` does

What git ls-files does is to read everything: the commit, the index, and the work-tree. Depending on what arguments you give to git ls-files, it then prints the names of some or all files that are in the index and/or in the work-tree:

git ls-files --stage

lists the files that are in the index / staging-area, along with their staging slot numbers. (It says nothing about the copies in the HEAD commit and the work-tree.) Or:

git ls-files --others

lists the (names of the) files that are in the work-tree, but not in the index. (It says nothing about the copies in the HEAD commit.) Or:

git ls-files --modified

lists the (names of the) files that are in the index and are different from their copies in the HEAD commit (or aren't in the HEAD commit at all). With no options:

git ls-files

lists the (names of the) files that are in the index, with no regard for what files are in the HEAD commit or the work-tree.

142

answered Oct 23 '22 22:10

torek

Related questions
                            
                                Git internals: Modifying `git-merge-one-file` to not use working tree
                            
                                git svn rebase index file open failed : Invalid argument
                            
                                Why does "git clone" not take a refspec?
                            
                                Can I use command-line Git tools and TortoiseGit simultaneously?
                            
                                Detect IP address of GitHub commit
                            
                                Repository test is failed when I clone git repository in Android Studio on Windows
                            
                                How to get commit where merged branch forked from
                            
                                How to remove merge commits while keeping merged changes?
                            
                                Commits pushed to Github through Github Desktop App(Windows) not showing up
                            
                                Why do we need dev branch? [closed]
                            
                                Intraweb HTML5 app, git and database - do they blend?
                            
                                Git2go - pull && merge
                            
                                Push rejected, failed to detect set buildpack heroku/php
                            
                                Trying to push my app to heroku gives me this error FileNotFoundError: [Errno 2] No such file or directory: '/app/gettingstarted/media'
                            
                                Why doesn't `git clone` clone all the branches? [duplicate]
                            
                                Working on multiple git branches that are dependent on each other
                            
                                Is there any way to Create Pull request to bitbucket git repository using intelliJ IDEA
                            
                                Git hook automatic installation
                            
                                Redo bad git conflict resolution after push
                            
                                Branch specific environment variables on Netlify

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does "git ls-files" do exactly and how do we remove a file from it?

Tags:

git

Mugen

People also ask

1 Answers

Summary

Long

Commits are frozen forever

The work-tree provides the useful-form copies

The index / staging-area sits between the commit and the work-tree

The work-tree can contain files that aren't in the index at all

Now we can see clearly what `git ls-files` does

torek

Recent Activity

Donate For Us

What does "git ls-files" do exactly and how do we remove a file from it?

Tags:

git

Mugen

People also ask

1 Answers

Summary

Long

Commits are frozen forever

The work-tree provides the useful-form copies

The index / staging-area sits between the commit and the work-tree

The work-tree can contain files that aren't in the index at all

Now we can see clearly what git ls-files does

torek

Related questions

Recent Activity

Donate For Us

Now we can see clearly what `git ls-files` does