Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git messed up my files, showing chinese characters in some places

disclaimer: By Git, I mean 'I' messed up.

Earlier, I wanted git-gui to show me the diff for which it thinks are binary files.

So I made some changes to my .\.gitattributes

*.ini       text
*.inc       text

But it didn't work. Then I made some changes to my .\.git\info\attributes

*.ini       text
*.inc       text
*.inc crlf diff
*.ini crlf diff

and it worked.

But now when I go back to previous commits it messes up...

chinese characters This is how it should look: english characters

It doesn't happen in all the files. EDIT: It happens only in files that have any special characters in them.

Q: Is it the issue with the commits itself or just some setting?
Q: Can I recover?

like image 617
laggingreflex Avatar asked Jul 07 '13 19:07

laggingreflex


3 Answers

Your ini files are saved in UTF-16LE, the encoding that Windows misleadingly describes as ‘Unicode’.

Git's default diffing tools don't work on UTF-16, because it's not an ASCII-compatible encoding. This is why git detected the files as binary originally.

LF/CRLF newline conversion is seeing each 0x0A byte as being a newline, and replacing it with 0x0D-0x0A. But, in a UTF-16LE file, a newline is actually signalled by 0x0A-0x00, and replacing that with 0x0D-0x0A-0x00 means that you've got an odd number of bytes, so the alignment of each two-byte code unit in the next line is out of sync. Consequently every other line gets mangled.

Your options are:

  1. Revert the attribute change and let Git handle the files as binary (losing the benefit of diffs).

  2. Save the files in an ASCII-compatible encoding. It looks like your content doesn't actually have any non-ASCII characters in, so hopefully that's not a problem? Normally you would want to save all your files as UTF-8 - this is ASCII-compatible but also allows all Unicode characters to be used. But that depends on whether Rainmeter supports reading INI files encoded like that (probably not).

  3. Configure git to use a different diff tool, though this will make it more complicated for others to work with your repo.

like image 130
bobince Avatar answered Oct 11 '22 20:10

bobince


I had a similar problem recently. We have a project-wide .gitattributes file at the root level, which includes the lines:-

* text=auto
*.sql     text

One of our team was writing SQL code using SQL Management Studio which, unknown to him, was saving the files as UTF-16. He was able to check-in the code to Git without problem, but on check-out the code was translated to the Chinese characters as described by this post.

A hexdump of the files in question confirmed the issue was indeed the translation of 0x000A to 0x000A0D.

For us the solution was to convert the files to ASCII using the following:-

  1. Delete the offending file from the working directory
  2. Create a temporary .gitattributes file in the local directory to force git to check-out the file without performing line-ending conversion. e.g. include the line *.sql binary

  3. Check-out the file(s) from Git. You should see that the files have not been translated and have no Chinese characters.

  4. Convert the file to ASCII. We used Notepad++ for this, but it's also possible to use iconv, which is installed as part of Git For Windows. I think UTF-8 would also be an option if the file contains non-ASCII characters - but this was not necessary for our purposes.
  5. Check-in the ASCII version of the file
  6. Delete the local .gitattributes file
like image 43
Rob Avatar answered Oct 11 '22 19:10

Rob


Here's a (bad) power-shell script that will fix files in this state. It will replace the sequence "0x0D 0x00 0x0D 0x0A" with "0x0D 0x00 0x0A" then overwrite the file it was given.

Afterwards you should probably re-save the file in something like UTF-8.

function Fix-Encoding
{
    Param(
        [String]$file
    )
    $f = get-item $file;
    $bytes = [System.IO.File]::ReadAllBytes($f.fullname);
    $output = new-object "System.Collections.Generic.List[System.Byte]"
    $output.Capacity = $bytes.Length

    for ($i = 0; $i -lt $bytes.Length; $i++)
    { 
        if ($i -lt $bytes.Length + 3)
        {
            if ($bytes[$i] -eq 0x0D -and $bytes[$i+1] -eq 0x00 -and $bytes[$i+2] -eq 0x0D -and $bytes[$i+3] -eq 0x0A) 
            {
                $output.Add(0x0D);
                $output.Add(0x00);
                $output.Add(0x0A);
                $i += 3
            }
            else {
                $output.Add($bytes[$i]);
            }
        }
     }
    [System.IO.File]::WriteAllBytes($f.fullname, $output)
}
like image 2
Ben Avatar answered Oct 11 '22 18:10

Ben