Git recognizes files encoded in ASCII or one of its supersets (e.g. UTF-8, ISO-8859-1, … ) as text files.
UTF-16 is a multibyte encoding and is not compatible with the single-byte ASCII. A non-unicode aware program will, at best, display a NUL character between all encoded ASCII-range characters.
There are a few options you can use: check the content-type to see if it includes a charset parameter which would indicate the encoding (e.g. Content-Type: text/plain; charset=utf-16 ); check if the uploaded data has a BOM (the first few bytes in the file, which would map to the unicode character U+FEFF - 2 bytes for ...
Main UTF-16 pros: BMP (basic multilingual plane) characters, including Latin, Cyrillic, most Chinese (the PRC made support for some codepoints outside BMP mandatory), most Japanese can be represented with 2 bytes.
I've been struggling with this problem for a while, and just discovered (for me) a perfect solution:
$ git config --global diff.tool vimdiff # or merge.tool to get merging too!
$ git difftool commit1 commit2
git difftool
takes the same arguments as git diff
would, but runs a diff program of your choice instead of the built-in GNU diff
. So pick a multibyte-aware diff (in my case, vim
in diff mode) and just use git difftool
instead of git diff
.
Find "difftool" too long to type? No problem:
$ git config --global alias.dt difftool
$ git dt commit1 commit2
Git rocks.
There is a very simple solution that works out of the box on Unices.
For example, with Apple's .strings
files just:
Create a .gitattributes
file in the root of your repository with:
*.strings diff=localizablestrings
Add the following to your ~/.gitconfig
file:
[diff "localizablestrings"]
textconv = "iconv -f utf-16 -t utf-8"
Source: Diff .strings files in Git (and older post from 2010).
Have you tried setting your .gitattributes
to treat it as a text file?
e.g.:
*.vmc diff
More details at http://www.git-scm.com/docs/gitattributes.html.
By default, it looks like git
won't work well with UTF-16; for such a file you have to make sure that no CRLF
processing is done on it, but you want diff
and merge
to work as a normal text file (this is ignoring whether or not your terminal/editor can handle UTF-16).
But looking at the .gitattributes
manpage, here is the custom attribute that is binary
:
[attr]binary -diff -crlf
So it seems to me that you could define a custom attribute in your top level .gitattributes
for utf16
(note that I add merge here to be sure it is treated as text):
[attr]utf16 diff merge -crlf
From there you would be able to specify in any .gitattributes
file something like:
*.vmc utf16
Also note that you should still be able to diff
a file, even if git
thinks it's binary with:
git diff --text
Edit
This answer basically says that GNU diff wth UTF-16 or even UTF-8 doesn't work very well. If you want to have git
use a different tool to see differences (via --ext-diff
), that answer suggests Guiffy.
But what you likely need is just to diff
a UTF-16 file that contains only ASCII characters. A way to get that to work is to use --ext-diff
and the following shell script:
#!/bin/bash
diff <(iconv -f utf-16 -t utf-8 "$1") <(iconv -f utf-16 -t utf-8 "$2")
Note that converting to UTF-8 might work for merging as well, you just have to make sure it's done in both directions.
As for the output to the terminal when looking at a diff of a UTF-16 file:
Trying to diff like that results in binary garbage spewed to the screen. If git is using GNU diff, it would seem that GNU diff is not unicode-aware.
GNU diff doesn't really care about unicode, so when you use diff --text it just diffs and outputs the text. The problem is that the terminal you're using can't handle the UTF-16 that's emitted (combined with the diff marks that are ASCII characters).
Solution is to filter through cmd.exe /c "type %1"
. cmd's type
builtin will do the conversion, and so you can use that with the textconv ability of git diff to enable text diffing of UTF-16 files (should work with UTF-8 as well, although untested).
Quoting from gitattributes man page:
Sometimes it is desirable to see the diff of a text-converted version of some binary files. For example, a word processor document can be converted to an ASCII text representation, and the diff of the text shown. Even though this conversion loses some information, the resulting diff is useful for human viewing (but cannot be applied directly).
The textconv config option is used to define a program for performing such a conversion. The program should take a single argument, the name of a file to convert, and produce the resulting text on stdout.
For example, to show the diff of the exif information of a file instead of the binary information (assuming you have the exif tool installed), add the following section to your $GIT_DIR/config
file (or $HOME/.gitconfig
file):
[diff "jpg"]
textconv = exif
A solution for mingw32, cygwin fans may have to alter the approach. The issue is with passing the filename to convert to cmd.exe - it will be using forward slashes, and cmd assumes backslash directory separators.
Create the single argument script that will do the conversion to stdout. c:\path\to\some\script.sh:
#!/bin/bash
SED='s/\//\\\\\\\\/g'
FILE=\`echo $1 | sed -e "$SED"\`
cmd.exe /c "type $FILE"
Set up git to be able to use the script file. Inside your git config (~/.gitconfig
or .git/config
or see man git-config
), put this:
[diff "cmdtype"]
textconv = c:/path/to/some/script.sh
Point out files to apply this workarond to by utilizing .gitattributes files (see man gitattributes(5)):
*vmc diff=cmdtype
then use git diff
on your files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With