Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to fix line-endings with git filter-branch, but having no luck

People also ask

How should git treat line endings?

This is a good default option. text eol=crlf Git will always convert line endings to CRLF on checkout. You should use this for files that must keep CRLF endings, even on OSX or Linux. text eol=lf Git will always convert line endings to LF on checkout.

What is CRLF and LF in git?

LF. original (usually LF , or CRLF if you're viewing a file you created on Windows) Both of these options enable automatic line ending normalization for text files, with one minor difference: core. autocrlf=true converts files to CRLF on checkout from the repo to the working tree, while core.

How do I see line endings in github?

To tell what line endings a file in the repository is using, use git show to extract the file's contents. This will give you the contents without changing the line endings.


The easiest way to fix this is to make one commit that fixes all the line endings. Assuming that you don't have any modified files, then you can do this as follows.

# From the root of your repository remove everything from the index
git rm --cached -r .

# Change the autocrlf setting of the repository (you may want 
#  to use true on windows):
git config core.autocrlf input

# Re-add all the deleted files to the index
# (You should get lots of messages like:
#   warning: CRLF will be replaced by LF in <file>.)
git diff --cached --name-only -z | xargs -0 git add

# Commit
git commit -m "Fixed crlf issue"

# If you're doing this on a Unix/Mac OSX clone then optionally remove
# the working tree and re-check everything out with the correct line endings.
git ls-files -z | xargs -0 rm
git checkout .

The git documentation for gitattributes now documents another approach for "fixing" or normalizing all the line endings in your project. Here's the gist of it:

$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git status        # Show files that will be normalized
$ git commit -m "Introduce end-of-line normalization"

If any files that should not be normalized show up in git status, unset their text attribute before running git add -u.

manual.pdf -text

Conversely, text files that git does not detect can have normalization enabled manually.

weirdchars.txt text

This leverages a new --renormalize flag added in git v2.16.0, released Jan 2018. For older versions of git, there are a few more steps:

$ echo "* text=auto" >>.gitattributes
$ rm .git/index     # Remove the index to force git to
$ git reset         # re-scan the working directory
$ git status        # Show files that will be normalized
$ git add -u
$ git add .gitattributes
$ git commit -m "Introduce end-of-line normalization"

My procedure for dealing with the line endings is as follows (battle tested on many repos):

When creating a new repo:

  • put .gitattributes in the very first commit along with other typical files as .gitignore and README.md

When dealing with an existing repo:

  • Create / modify .gitattributes accordingly
  • git commit -a -m "Modified gitattributes"
  • git rm --cached -r . && git reset --hard && git commit -a -m 'Normalize CRLF' -n"
    • -n (--no-verify is to skip pre-commit hooks)
    • I have to do it often enough that I defined it as an alias alias fixCRLF="..."
  • repeat the previous command
    • yep, it's voodoo, but generally I have to run the command twice, first time it normalizes some files, second time even more files. Generally it's probably best to repeat until no new commit is created :)
  • go back-and-forth between the old (just before normalization) and new branch a few times. After switching the branch, sometimes git will find even more files that need to be renormalized!

In .gitattributes I declare all text files explicitly as having LF EOL since generally Windows tooling is compatible with LF while non-Windows tooling is not compatible with CRLF (even many nodejs command line tools assume LF and hence can change the EOL in your files).

Contents of .gitattributes

My .gitattributes usually looks like:

*.html eol=lf
*.js   eol=lf
*.json eol=lf
*.less eol=lf
*.md   eol=lf
*.svg  eol=lf
*.xml  eol=lf

To figure out what distinct extensions are tracked by git in the current repo, look here

Issues after normalization

Once this is done, there's one more common caveat though.

Say your master is already up-to-date and normalized, and then you checkout outdated-branch. Quite often right after checking out that branch, git marks many files as modified.

The solution is to do a fake commit (git add -A . && git commit -m 'fake commit') and then git rebase master. After the rebase, the fake commit should go away.


git status --short|grep "^ *M"|awk '{print $2}'|xargs fromdos

Explanation:

  • git status --short

    This displays each line that git is and is not aware of. Files that are not under git control are marked at the beginning of the line with a '?'. Files that are modified are marked with an M.

  • grep "^ *M"

    This filters out only those files that have been modified.

  • awk '{print $2}'

    This shows only the filename without any markers.

  • xargs fromdos

    This takes the filenames from the previous command and runs them through the utility 'fromdos' to convert the line-endings.


Here's how I fixed all line endings in the entire history using git filter-branch. The ^M character needs to be entered using CTRL-V + CTRL-M. I used dos2unix to convert the files since this automatically skips binary files.

$ git filter-branch --tree-filter 'grep -IUrl "^M" | xargs -I {} dos2unix "{}"'