Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "git ls-files" do exactly and how do we remove a file from it?

Tags:

git

Does it show files from the local repository, the staging repository, the remote repository or from somewhere else?

I'm constantly seeing a file that is present in "git ls-files". That file was deleted from the remote repository. After which I tried doing a git pull. However, that file still shows up in this command list. It should not be present here because it's not present in the remote repository either.

like image 415
Mugen Avatar asked May 21 '19 09:05

Mugen


People also ask

How do I remove a file from git ls?

By using the “git ls-tree” command, I am able to see the files tracked on my current branch. In order to delete the file “file1” from the Git repository and from the filesystem, we are going to execute the “git rm” command with the name of the file.

What does ls do in git?

The ls command lists the current directory contents and by default will not show hidden files. If you pass it the -a flag, it will display hidden files. You can navigate into the . git directory like any other normal directory.

How do I remove a file in git?

Simply view any file in your repository, click the trash can icon at the top, and commit the removal just like any other web-based edit. Then " git pull " on your local repo, and that will delete the file locally too.

How do I remove a file from a git commit?

If this is your last commit and you want to completely delete the file from your local and the remote repository, you can: remove the file git rm <file> commit with amend flag: git commit --amend.


1 Answers

Summary

You need to wrap your head around the idea that Git stores at least three, and sometimes up to five active copies of each file: one in the current commit, one (or two or three!) in the index, and one—the only one you can see and work with—in your work-tree. The git ls-files command looks at these copies, then tells you something about some of them, depending on the flags you supply to git ls-files.

Without this idea of three-to-five copies of each file, lots of things in Git will never make any sense. (Well, some things are still tricky even with it, but that's another problem entirely. 😀)

Long

I think there are two issues here. One requires some terminology and then the other should fall into place:

Does [git ls-files] show files from the local repository,

Sort of, but:

the staging repository,

Git does not have a staging repository. Each repository has something that is called, in different Git documentation, either the index or the staging area. (There's an obsoleted third name, cache, that also appears in the Git glossary.)

the remote repository

Definitely not: there need not be any remote repositories—i.e., other Gits with their own repositories—at all, and if there are, only git fetch and git push have your Git call up their Git and exchange data with them. (Well, git ls-remote does the first little bit of git fetch, and git pull runs git fetch, so these two also exchange data with a remote. But git ls-files doesn't.)

or from somewhere else?

Yes, sort of. That gets us back to the first part. So let's take these three bits of terminology as defined in the Git glossary. Italic (including bold italic) text in below is directly from the linked documentation:

  • repository

    A collection of refs together with an object database containing all objects which are reachable from the refs, possibly accompanied by meta data from one or more porcelains. A repository can share an object database with other repositories via alternates mechanism. (all links theirs)

    This of course is full of yet more terminology. To attempt to de-mystify it a bit, what they're saying here is that the repository proper doesn't include the index and work-tree: it's mostly made up of the commits (and their contents). Of course, that requires that we define "index" and "work-tree", so let's move on to:

  • index

    A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. Truth be told, it can also contain a second, and even a third version of a working tree, which are used when merging.

  • working tree (I usually call this work-tree):

    The tree of actual checked out files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.

Commits are frozen forever

When you run git commit, Git makes a snapshot of all of your files—well, all of your tracked files, anyway—and stores that, plus some metadata like your name and email address, in a commit. This commit is mostly permanent—you can get rid of commits, usually with a fair bit of difficulty, but just think of them as permanent for convenience—and is totally, completely, 100% read-only. It's read-only like this on purpose, because that allows other commits to share identical copies of files, so that if you commit the same file once, ten times, or even a million times, there's really only one copy of that file in the repository. It's only when you change the file to a new version that Git has to commit a new, separate copy.

The commits are numbered, but not by a nice easy sequential numbering system. That is, we might draw them as a series of simple numbered or lettered things:

... <-C4 <-C5 <-C6 ...

where each later commit points back to its immediate predecessor. But their actual names are big ugly hash IDs. Each one is guaranteed to be unique, which is why they have to be so big and ugly and random-looking. Each hash ID is actually a cryptographic checksum, calculated over the commit's contents, so that every Git everywhere in the universe will agree that that commit, and only that commit, gets that checksum. That's the other reason you—and even Git—can't change it: if you take a commit out of the repository database, tinker with it, and change even one single bit and then put it back into the database, what you get is a new commit with a new and different hash ID.

So commits are totally frozen, forever. The files inside them are frozen forever as well, and compressed, and in a special Git-only format. I like to call these files "freeze-dried". What this means is that, hey, they're great for archiving, but they are utterly useless for getting any new work done ... and that means that Git must provide some way of taking these freeze-dried files and rehydrating them into a useful form.

The work-tree provides the useful-form copies

Things don't really get much simpler than this: the work-tree has the useful-form, rehydrated copies of your files. Because they're just ordinary everyday files on your computer, you can see them, use them, change them around however you like, and otherwise work with them. They're technically not in the repository at all—they are more just right next to it. In a typical setup, the repository itself is in the .git directory/folder of the top level of your work-tree.

Obviously, if there's a commit you've extracted to make the work-tree, there must now be two copies of each file: the freeze-dried committed one, plus the regular working one. Git could stop here. Mercurial does stop here: if you use Mercurial instead of Git, you don't need to concern yourself with a third copy, because there is no third copy. But Git goes on to store yet more copies of the files.

The index / staging-area sits between the commit and the work-tree

What Git does here is to interpose a third copy of each file, between the freeze-dried committed copy and the work-tree copy. This third copy is in the committed-file format—i.e., pre-dehydrated–but by not being in a commit, it's not actually totally frozen: it can be replaced at any time. That's what git add does: git add takes the ordinary copy of the file from the work-tree, compresses it down into the freeze-dried format, and replaces the copy that's in the index. Or, if the file wasn't in the index at all, it puts a copy into the index.

This is why you have to git add files all the time. In Mercurial, you only hg add a file once. After that, you just run hg commit, and Mercurial looks at all the files it knows about, and freezes them into a new commit. This can take a long time, in a big repository. Git, by contrast, already has all the files it's supposed to know about, and already dehydrated, in the index, so git commit can just package up those dehydrated files into a new frozen commit. The cost of this speed is git add, but if you get into playing clever tricks with the index copies—e.g., using git add -p—you get more benefits than just the speedup.

As the Git glossary mentioned in its description of the index, the index takes on an expanded role during a conflicted merge. When you do a merge operation—whether that's from git merge, or from git revert or git cherry-pick or any other Git command that uses the merge engine—and it doesn't go smoothly, Git winds up putting all three inputs for each file into the index, so that instead of just one copy of file.ext, you get three. But as long as you're not in the middle of a merge, there's only one copy in the index.

Usually the index copy matches the HEAD frozen copy, or matches the work-tree copy, or both. For instance, after a fresh git checkout, all three copies match. Then you modify file.ext in the work-tree: now the commit and the index match, but they're not the same as the work-tree copy. Then you git add file.ext, and now the index and work-tree match, but they're different from the frozen copy. Then you git commit to make a new commit, which becomes the current commit, and all three copies match again.

Note that you can modify the work-tree copy:

vim file.ext

then copy the updated one into the index:

git add file.ext

then edit it again:

vim file.ext

and that way, you can make all three copies different. If you do that, git status will say that you have changes staged for commit, because the index copy is different from the current-commit copy, and say that you have changes not staged for commit, because the work-tree copy is different from the index copy.

The work-tree can contain files that aren't in the index at all

The index is initially just a copy of the current commit. Git then also copies those files to the work-tree, so that you can use them. But you can create files in the work-tree and not run git add on them. Those files aren't in the index now, and if you run git commit, they won't be in the new commit either, because Git builds the new commit from the index.

You can also remove files from the index, without removing them from the work-tree:

git rm --cached file.ext

removes the index copy. It can't touch the current commit frozen copy, of course, but if you now make a new commit, the new commit won't have file.ext in it at all. (The previous commit still does, of course.)

Any file that is in your work-tree right now, and is not in your index right now, is an untracked file. Its untracked-ness comes from the fact that it's not in your index. Put that file into your index and it's tracked, no matter how you got it into your index. Remove it from your index and it's untracked, no matter how you got it out of your index. So that's the last role of the index: to determine which files are tracked, and will therefore be in the next commit.

Now we can see clearly what git ls-files does

What git ls-files does is to read everything: the commit, the index, and the work-tree. Depending on what arguments you give to git ls-files, it then prints the names of some or all files that are in the index and/or in the work-tree:

git ls-files --stage

lists the files that are in the index / staging-area, along with their staging slot numbers. (It says nothing about the copies in the HEAD commit and the work-tree.) Or:

git ls-files --others

lists the (names of the) files that are in the work-tree, but not in the index. (It says nothing about the copies in the HEAD commit.) Or:

git ls-files --modified

lists the (names of the) files that are in the index and are different from their copies in the HEAD commit (or aren't in the HEAD commit at all). With no options:

git ls-files

lists the (names of the) files that are in the index, with no regard for what files are in the HEAD commit or the work-tree.

like image 142
torek Avatar answered Oct 23 '22 22:10

torek