Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignoring changes to a specific file when doing git pull

Tags:

git

There's a file in my project that I'd like to change locally without it being overwritten every time I pull from the repo, i.e. I want to reject incoming changes to that specific file. My solution so far has been to do git stash --> git pull --> git stash pop

The file is in .gitignore both locally and in the repo. I've tried git update-index --assume-unchanged and git update-index --skip-worktree, but to no avail. I was thinking of doing git rm --chached, but from what I've read it seems like this will delete the file from the repo, which is not what I want.

like image 555
guest856 Avatar asked Oct 08 '18 13:10

guest856


1 Answers

The file is in .gitignore both locally and in the repo ...

As you have seen, this does no good if the file is actually committed. That's because .gitignore does not mean ignore this file, it really means shut up about this file if it's untracked. If it's not untracked (if it is tracked), listing the file has no effect at all.

I've tried git update-index --assume-unchanged and git update-index --skip-worktree, but to no avail.

For confirmation, it would be helpful if you showed exactly where this goes wrong. For now, I'm assuming that where it goes wrong is that those commands appear to work—they do not complain at all—but a later git fetch && git merge complains that it would overwrite the contents of the file. (You may be spelling this git pull, but if so, I recommend splitting it into its two component Git commands until you really understand what each command does on its own.)

This is where things get complicated. We must understand Git's model for commits and merges. With this comes the role of the index (aka staging area aka cache), and the role of the work-tree, during a git merge.

How merge works

First, let's do a quick overview of commits and the merge process. You already know1 that each commit has a complete snapshot of all of your committed files, and each commit contains the hash ID(s) of its parent commit(s). Hence if we were to draw the graph of commits after git fetch but before git merge runs, we might see this:

       G--H--I   <-- master (HEAD)
      /
...--F
      \
       J--K--L   <-- origin/master

1If you don't already know all of this, read some more about Git, e.g., the chapter on branching in the Git Book.


What git merge is going to do in this case is to find the shared commit, the common starting point, for both your commit I, to which your name master points, and their commit L, to which your origin/master points. Here, that's commit F.

Next, Git will compare the contents saved in commit F to the contents saved in your own latest commit, I. That tells Git what you changed. The comparison is the same as the one you can view if you run:

git diff --find-renames <hash of F> <hash of I>   # what we changed

Git will also compare the contents saved in commit F to the contents saved in their latest commit L. That tells Git what they changed:

git diff --find-renames <hash of F> <hash of L>   # what they changed

Now you can see how git merge actually works: it combines what you changed with what they changed. Using the combined changes, Git extracts the contents saved with the base commit—commit F—and applies the combined changes to all of the files changed in the two sets of changes. The result, if all goes well, is the snapshot that should be committed as the merge; and Git will do that, committing the merge and adjusting your current branch:

       G--H--I
      /       \
...--F         M   <-- master (HEAD)
      \       /
       J--K--L   <-- origin/master

The index and the work-tree

There is a fundamental problem with files that are frozen into commits: they are (a) frozen, and (b) in a special, compressed, Git-only form. The frozen part is great for source code management: these files will never change, so you can always get back your previous work, by checking out an old commit. The special compressed Git-only form is useful for keeping your storage space under control: since Git saves every version of every file ever, if they weren't special and compressed, you might run out of disk space pretty fast. But it creates a problem: How do you get at the frozen file? How can you change it?

Git's answer to this is the work-tree. Doing git checkout on some commit expands and thaws the files that are saved in that commit. The thawed-out, reconstitute files go into your work-tree, where you can work with them and change them.

In other version control systems, that's the end of the story right there: you have your frozen files, which you can't change, and your unfrozen work-tree, which you can. But Git adds this intermediate form, which Git calls the index, or the staging area, or the cache, depending on who / which part of Git is doing this calling.

Understanding the index is crucial to using Git, yet it's rarely explained very well. People (and IDEs) try to paper over it and hide it from you. This doesn't work, and the reason it doesn't work is important—especially in your case.

The best description I know of for the index is that it's what will go into your next commit. When Git is extracting the frozen files, instead of unfreezing and de-compressing them straight into your work-tree, it first just unfreezes them (or more precisely, collects them into a single unified list that isn't frozen—as opposed to the more structured, frozen lists inside commits). These now-unfrozen copies go into the index. They are still all Git-ified, compressed and taking minimal storage.

Once a file is unfrozen into the index, only then does Git de-compress it into the work-tree format. So it's first unfrozen (index copy), and then it's extracted into the work-tree. This means the index has a copy that's ready to be frozen into the next commit.

If you change the file in the work-tree, you must run git add on that file to copy (and compress and Git-ify) the file so that it fits into the index. Now the index copy matches the work-tree copy, except that the index copy is in the special Git-only form. Now it's ready to go into the next commit.

This is how git status works: for each file, it compares the work-tree copy to the index copy, and if those are different, it says the file is not staged for commit. It also compares the index copy—in the special Git-only format—to the HEAD commit copy, and if those are different, it says the file is staged for commit. So if there are 10,000 files in the work-tree and the index and the HEAD commit, there are actually 30,000 copies total (10k x 3 copies). But if only two of them are different in terms of these three copies, only two files get listed in git status (and the Git-ified copies are relatively tiny).

Until you run git commit, files that are different in the index are simply different in the index. When you do run git commit, Git freezes the index—without even looking at the work-tree!—and makes this your new HEAD commit. Your new commit now matches the index, so now all the index copies of files match their `HEAD commit copies.

(Aside: during a conflicted merge, the index takes on an expanded role. Instead of just one copy of each file, it now holds up to three copies of each file. But we're not looking at conflicted merges here, so we don't have to worry about this.)

The assume-unchanged and skip-worktree bits

Now we can see what these two bits do. When you run git status, Git normally compares the work-tree copy of every file to the index copy of the same file. If it's different, Git says that you have a change that is not staged for commit. (If the file isn't even in the index at all, Git says the file is untracked, and then the .gitignore file matters. But if the file is already in the index, the file is tracked, and the .gitignore file does not matter.)

If you set either the assume-unchanged or skip-worktree bits, git status won't compare the work-tree version of the file to the index version. They can be as different as you like, and git status will say nothing about them.

Note that git commit completely ignores these bits! It simply freezes the index copy. If the index copy matches the work-tree copy, this means you've kept the file the same when committing it again. Your new commit has the same frozen copy as your previous commit. Your index copy continues to match your HEAD commit copy.

The problem comes about when Git needs to change the file. Let's say that you have set the skip-worktree bit (this is the one you should set, in general, as the other is meant for a different problem, although in practice either one works). You have also modified the work-tree copy. Running git status won't complain, because git status won't actually compare the work-tree copy to the index copy any more.

But now you run git merge, and the merge wants to take the changes to the file. Git compares commit F to commits I and L, for instance, and finds that although you have not committed a new version of the file in I, they did commit a new version of the file in L. So Git will take their changes, make those go into the new merge commit M, and then ... extract M into your work-tree, clobbering your copy of the file.

Clobbering your file is bad, so Git doesn't do that. Instead, it just fails the merge.

What should you do about this?

Ultimately, what you must do is save your version of the file somewhere. That could be inside Git—as a commit, for instance—or outside of Git, by copying the file outside the repository. Then you can either merge your changes with their changes, or re-do your changes after just taking their version.

This is what git stash does, in fact. It makes a commit—well, really two commits. The special thing about these commits is that they are not on any branch. Having made the commits, git stash runs git reset --hard to throw away your changes to the file from the index and work-tree. You had no index changes, and even if you did the index copy is saved in the stash commits, so the --mixed part of the reset is safe, and your work-tree copy is saved in the stash commits, so the --hard part of the reset is safe. Now that your index and work-tree are clean you can merge safely. Then git stash pop—which is really git stash apply && git stash drop—can merge your stashed work-tree version of the file with your current version of the file, using a less-safe internal merge that works only on the work-tree copy. The drop step drops the stash commits, so that they become unreferenced2 and will eventually be removed entirely.

There are several alternatives to using git stash here, but none are as simple, and none are pretty. You might as well use git stash.

Last, you can stop committing the file entirely. Once the file is no longer in the index, it becomes untracked and is not in any future commits. This is ultimately the best solution, in my opinion, but it has one really huge drawback: the file has been committed in the past. This means that if you ever check out an old commit (that does have the file), the file will be in both the current commit—that old commit you've just checked out—and the index, and will be tracked. When you switch away from that old commit to a new commit that doesn't have the file, that will remove the file! This is what you mean when you say:

I was thinking of doing git rm --cached, but from what I've read it seems like this will delete the file from the repo ...

Specifically, it deletes the file from the index, while leaving the work-tree copy alone. This does not delete the file from the repo, but the repo itself is mainly composed of commits, and "delete from the repo" is a nonsense phrase. You literally can't delete the file from the existing commits: they are frozen forever in time. You can only avoid putting the file into future commits.

This works, but leaves that trap I outlined above: going back to a historical commit restores the file to the index (because git checkout commit means populate the index from a commit and use that to populate the work-tree). Once in the index, it will be in future commits. Switching to a commit that doesn't have the file requires removing it from the index, which implies removing it from the work-tree, and now the work-tree copy is gone.

So, if you want to go this route, which is a good way to go, you should:

  • stop using that file at all: rename it to config.sample
  • switch to a new (different) file name for the actual configuration, and keep this file out of the repository entirely (store it in $HOME/.fooconfig, for instance)

and make that all part of one commit, after which the old configuration file is never used again. Tell people to move their configuration to the new location before switching to the new version of the foo program. Make this a major version bump, because the behavior is different.


2See Think Like (a) Git.

like image 87
torek Avatar answered Oct 02 '22 00:10

torek