A week or two ago I took some files that I had been archiving with a simple find |sed|tar|xz|gpg
bash script, unpacked them all, and put their contents in a git repo, commited, put the next archives content in the repo, committed (rinse and repeat) in order to have a nicer system.
All files were edited with on one of my two computers, both using Arch Linux, in either TeXstudio or Vim.
I tried to checkout an old version, but its flipping out---it won't let me due to changed that are outstanding. I tried everything I knew how, and then went on Google to find out things I didn't know.
There are a number of other questions on this subject. Unfortunately their answers have not helped me. For sake of completion I'll list the questions.
$ git status
On branch master
Changes not staged for commit:
(use "git add ..." to update what will be committed)
(use "git checkout -- ..." to discard changes in working directory)modified: Arcs/arc1.tex
modified: Arcs/arc2.tex
modified: Arcs/frontmatter.texno changes added to commit (use "git add" and/or "git commit -a")
Also, so people don't need to look, below, I already did the obvious ones.
git reset --hard
git -a commit
git stash
git pull
as well as remove everything from the index and add it back.
I'm not on Windows. Also, this should have anything to do with line endings since I'm the only user. There is no reason for there to be weird line endings.
git reset --hard HEAD (among other possibilities)
git stash
git stash drop
git config core.autocrlf input
git rm --cached -r .
git reset --hard
git add .
git commit -m "Normalize line endings"
Not only did this not work but it increased the number of files that are misbehaving and also wrote 700+ lines to a file for. . .reasons. It wasn't even the file that was misbehaving.
More end line stuff.
git clean -df
git checkout -- .
git checkout -- ./.
git checkout-index -a -f
git checkout --force master
I tried commiting the changing git commit -am "WORK DAMN YOU!"
then git revert --hard HEAD^
I also tried pulling from my private remote, but was just told that the local repo was already up to date.
This is extremely frustrating.
Try git rm --cached -r .
after git reset --hard
That was the only solution that worked for me. Hope it will help somebody!
According to a comment:
It was
.git/info/attributes
. ... Why was that having such an effect? ... I need those [*.tex
] filters ...
You can use them. You just have to be aware that Git doesn't understand them, and you may want to tweak them in some fashion. Unfortunately there are few good alternatives for doing said tweaking.
The way that filters—called clean and smudge filters—work is directly related to the way that core.autocrlf
and end-of-line mangling works. To understand them yourself, start with a few simple facts:
The content of any Git object—commit, tree, blob, or annotated tag—literally cannot be changed. This is because the content is retrieved by its database key, which is a hash ID (currently SHA-1, in the future perhaps SHA-3 or some other very good hash) that must match the computed hash of the content.
You retrieve a commit by its hash ID. A branch name like master
or develop
just contains the actual hash ID of the latest commit on that branch.
Each commit stores the raw hash ID of its parent commit, as part of its content, and stores the raw hash ID of the tree object that leads to the blob objects and thus produces the snapshot for that commit.
To store a new object into the database, you feed the object into git hash-object -w
(or Git does this internally on its own). Git now computes the hash of the content, including the header that gives the object's type and size, and stores the value—the content—into the database and emits the key. You may then use the key in the future to retrieve the content. At that time, Git re-checks the hash: it must match the key. If it does not match, the data have been corrupted, and Git stops.
Hence, the commit hash must match the commit contents, which give the tree hash for the tree contents, which give the blob hashes for the blob contents. If the commit is itself not the tip of a branch, the commit was found by walking back through a tip commit to some number of previous commits, all by their hash IDs. The resulting data structure is a Merkle Tree that provides Git's data-integrity guarantees.
This means that any filtering cannot be done on already-committed content. And yet, it must be done on already-committed content, so that Windows users can have CRLF line endings, for instance. How is Git to resolve this paradox?
The answer lies in another several facts about Git:
You cannot work directly with commit contents. They need to be extracted into a working area, called the work-tree. The work-tree (or working tree or however you prefer to spell it) has the extracted files in de-compressed form, where they can be read and written.
But Git adds an intermediate data structure as well, which Git originally just called the index. This was not a very good name, so this data structure wound up with three names: it's the index, the staging area, and the cache. This index keeps tabs on the work-tree, caching (hence the third name) stat
system call data for instance. Each file from the current commit is first extracted into the index, keeping it in its special compressed form—actually, just using the raw blob hash ID directly—so that the index has, or really, has a reference to, the copy of the file in the commit.
Running git add
on a file copies the file into the index (really, adding it as a blob object into the main database and computing its hash ID, then updating the hash ID in the index). This means that the index is, at all times, the image that Git will use for the next commit you can make. This is where it gets the name staging area. Because you can overwrite index files with git add
, they are writable here, where they are not writable in commits.
Running git commit
packages the current index into a tree object, freezing it for all time—the blob hashes are no longer changeable—and uses the tree object to make the new commit.
This index is how Git gets a lot of its speed, as compared to other version control systems. Since the index keeps track of the work-tree, Git can do a lot of things much faster than usual: git status
, for instance, can call stat
on a directory or file and compare the result to cached stat
data, without having to read the file itself.
(The index also takes on an expanded role during conflicted merges. This isn't relevant to clean and smudge filters and LF/CRLF wars, but is worth mentioning while we're talking about the index. Instead of just one entry per file-to-be-committed, the index can hold three not-to-be-committed entries: one from the merge base, and one from each of the two branch tips being merged.)
We are now ready to see how filtering really works. Let's summarize the key points about commits, the index, and the work-tree:
git checkout
copies a commit's tree to the index, after which it exactly matches the commit's tree but in a form more suitable to keep track of the work-tree.git checkout
also copies each commit's file to the work-tree, while updating the index slot for that file.git add
copies a file from the work-tree back into the index, so that a future git commit
can just freeze the index.Now, remember, a smudge filter is applied to committed content, as it's turned into a work-tree file. A clean filter is applied to work-tree content, as it's turned into committed—or at least, to-be-committed—content. The smudge filter time is when LF-only line endings can become CRLF line endings for Windows users, and the clean filter time is when CRLF line endings can turn back to LF-only line endings.
The ideal time to apply a smudge filter is while the file is being expanded, i.e., copied from the index to the work-tree. The ideal time to apply a clean filter is while the file is being compressed, i.e., copied from the work-tree to the index. So this is when Git does it.
At the same time, though, one of the key features of the index is speed. So Git assumes that applying the smudge filter doesn't "change" the file, in some sense. The content in the work-tree file may not match the decompressed blob any more, but—at least by intent and purpose—it still matches what you would get by cleaning and re-compressing the work-tree file.
The rub comes in when this isn't true. What if cleaning and re-compressing the file results in different content, with a different hash ID? The answer is that Git may notice, and yet Git may not notice, all depending on the vagaries of the effectiveness of the index-as-cache and the stat
data saved in the index, vs the stat
data delivered by a later system call.
If the smudge and clean filters are perfect mirrors—so that a smudged and re-cleaned file always matches the original—you can git add
the file after extraction, and Git will update the saved stat
data. As long as that does not change again, Git will now believe that the file is clean. If the underlying file system has unreliable stat
data, you can use the index's assume unchanged bit to force Git to think that the file is clean anyway. This is pretty crude and not a pleasing solution, but it will do the job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With