Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Line endings messed up in Git - how to track changes from another branch after a huge line ending fix?

We are working with a 3rd party PHP engine that gets regular updates. The releases are kept on a separate branch in git, and our fork is the master branch.

This way we'll be able to apply patches to our fork from the new releases of the engine.

My problem is, after many commits to our branch, I realized that the initial import of the engine was done with CRLF line endings.

I converted every file to LF, but this made a huge commit, with 100k lines removed and 100k lines added, which obviously breaks what we intended to do: easily merge in patches from the factory releases of that 3rd party engine.

What whould I do know? How can I fix this? I already have hundreds of commits on our fork.

What would be good is to somehow do a line endings fix commit after the initial import and before branching our own fork, and removing that huge line ending commit later in history.

However I have no idea how to do this in Git.

Thanks!

like image 290
keo Avatar asked Jun 18 '09 10:06

keo


People also ask

How do I fix git line endings?

To ensure that all the line endings in your repository match your new configuration, backup your files with Git, delete all files in your repository (except the . git directory), then restore the files all at once. Save your current files in Git, so that none of your work is lost.

Does GIT store line endings?

Git does not store "neutralized lines". So yes, Git can save files with mixed line endings, but that's usually a bad practice to avoid.

How do I see line endings in github?

To tell what line endings a file in the repository is using, use git show to extract the file's contents. This will give you the contents without changing the line endings.


3 Answers

we are avoiding this problem in the future with:

1) everyone uses an editor which strips trailing whitespaces, and we save all files with LF.

2) if 1) fails (it can - someone accidentally saves it in CRLF for whatever reason) we have a pre-commit script that checks for CRLF chars:

#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by git-commit with no arguments.  The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit" and set executable bit

# original by Junio C Hamano

# modified by Barnabas Debreceni to disallow CR characters in commits


if git rev-parse --verify HEAD 2>/dev/null
then
    against=HEAD
else
    # Initial commit: diff against an empty tree object
    against=4b825dc642cb6eb9a060e54bf8d69288fbee4904
fi

crlf=0

IFS="
"
for FILE in `git diff-index --cached $against`
do
    fhash=`echo $FILE | cut -d' ' -f4`
    fname=`echo $FILE | cut -f2`

    if git show $fhash | grep -EUIlq $'\r$'
    then
        echo $fname contains CRLF characters
        crlf=1
    fi
done

if [ $crlf -eq 1 ]
then
    echo Some files have CRLF line endings. Please fix it to be LF and try committing again.
    exit 1
fi

exec git diff-index --check --cached $against --

This script uses GNU grep, and works on Mac OS X, however it should be tested before use on other platforms (we had problems with Cygwin and BSD grep)

3) In case we find any whitespace errors, we use the following script on erroneous files:

#!/usr/bin/env php
<?php

    // Remove various whitespace errors and convert to LF from CRLF line endings
    // written by Barnabas Debreceni
    // licensed under the terms of WFTPL (http://en.wikipedia.org/wiki/WTFPL)

    // handle no args
    if( $argc <2 ) die( "nothing to do" );


    // blacklist

    $bl = array( 'smarty' . DIRECTORY_SEPARATOR . 'templates_c' . DIRECTORY_SEPARATOR . '.*' );

    // whitelist

    $wl = array(    '\.tpl', '\.php', '\.inc', '\.js', '\.css', '\.sh', '\.html', '\.txt', '\.htc', '\.afm',
                    '\.cfm', '\.cfc', '\.asp', '\.aspx', '\.ascx' ,'\.lasso', '\.py', '\.afp', '\.xml',
                    '\.htm', '\.sql', '\.as', '\.mxml', '\.ini', '\.yaml', '\.yml'  );

    // remove $argv[0]
    array_shift( $argv );

    // make file list
    $files = getFileList( $argv );

    // sort files
    sort( $files );

    // filter them for blacklist and whitelist entries

    $filtered = preg_grep( '#(' . implode( '|', $wl ) . ')$#', $files );
    $filtered = preg_grep( '#(' . implode( '|', $bl ) . ')$#', $filtered, PREG_GREP_INVERT );

    // fix whitespace errors
    fix_whitespace_errors( $filtered );





    ///////////////////////////////////////////////////////////////////////////////////////////////
    ///////////////////////////////////////////////////////////////////////////////////////////////


    // whitespace error fixer
    function fix_whitespace_errors( $files ) {
        foreach( $files as $file ) {

            // read in file
            $rawlines = file_get_contents( $file );

            // remove \r
            $lines = preg_replace( "/(\r\n)|(\n\r)/m", "\n", $rawlines );
            $lines = preg_replace( "/\r/m", "\n", $lines );

            // remove spaces from before tabs
            $lines = preg_replace( "/\040+\t/m", "\t", $lines );

            // remove spaces from line endings
            $lines = preg_replace( "/[\040\t]+$/m", "", $lines );

            // remove tabs from line endings
            $lines = preg_replace( "/\t+$/m", "", $lines );

            // remove EOF newlines
            $lines = preg_replace( "/\n+$/", "", $lines );

            // write file if changed and set old permissions
            if( strlen( $lines ) != strlen( $rawlines )){

                $perms = fileperms( $file );

                // Uncomment to save original files

                //rename( $file, $file.".old" );
                file_put_contents( $file, $lines);
                chmod( $file, $perms );
                echo "${file}: FIXED\n";
            } else {
                echo "${file}: unchanged\n";
            }

        }
    }

    // get file list from argument array
    function getFileList( $argv ) {
        $files = array();
        foreach( $argv as $arg ) {
          // is a direcrtory
            if( is_dir( $arg ) )  {
                $files = array_merge( $files, getDirectoryTree( $arg ) );
            }
            // is a file
            if( is_file( $arg ) ) {
                $files[] = $arg;
            }
        }
        return $files;
    }

    // recursively scan directory
    function getDirectoryTree( $outerDir ){
        $outerDir = preg_replace( ':' . DIRECTORY_SEPARATOR . '$:', '', $outerDir );
        $dirs = array_diff( scandir( $outerDir ), array( ".", ".." ) );
        $dir_array = array();
        foreach( $dirs as $d ){
            if( is_dir( $outerDir . DIRECTORY_SEPARATOR . $d ) ) {
                $otherdir = getDirectoryTree( $outerDir . DIRECTORY_SEPARATOR . $d );
                $dir_array = array_merge( $dir_array, $otherdir );
            }
            else $dir_array[] = $outerDir . DIRECTORY_SEPARATOR . $d;
        }
        return $dir_array;
    }
?>
like image 161
keo Avatar answered Oct 04 '22 14:10

keo


Did you look at git rebase?

You will need to re-base the history of your repository, as follows:

  • commit the line terminator fixes
  • start the rebase
  • leave the third-party import commit first
  • apply the line terminator fixes
  • apply your other patches

What you do need to understand though is that this will break all downstream repositories - those that are cloned from your parent repo. Ideally you will start from scratch with those.


Update: sample usage:

target=`git rev-list --max-count=3 HEAD | tail -n1`
get rebase -i $target

Will start a rebase session for the last 3 commits.

like image 36
Robert Munteanu Avatar answered Oct 04 '22 13:10

Robert Munteanu


Going forward, avoid this problem with the core.autocrlf setting, documented in git config --help:

core.autocrlf

If true, makes git convert CRLF at the end of lines in text files to LF when reading from the filesystem, and convert in reverse when writing to the filesystem. The variable can be set to input, in which case the conversion happens only while reading from the filesystem but files are written out with LF at the end of lines. A file is considered "text" (i.e. be subjected to the autocrlf mechanism) based on the file's crlf attribute, or if crlf is unspecified, based on the file's contents. See gitattributes.

like image 3
Greg Bacon Avatar answered Oct 04 '22 13:10

Greg Bacon