When creating a diff patch with Git Shell in Windows (when using GitHub for Windows), the character encoding of the patch will be UCS-2 Little Endian according to Notepad++ (see the screenshots below).
How can I change this behavior, and force git to create patches with ANSI or UTF-8 without BOM character encoding?
It causes a problem because UCS-2 Little Endian encoded patches can not be applied, I have to manually convert it to ANSI. If I don't, I get "fatal: unrecognized input" error.
Since then, I also realized that I have to manually convert the EOL from Windows format (\r\n
) to UNIX (\n
) in Notepad++ (Edit > EOL Conversion > UNIX). If I don't do this, I get "trailing whitespace" error (even if all the whitespaces are trimmed: "TextFX" > "TextFX Edit" > "Trim Trailing Spaces").
So, the steps I need to do for the patch to be applied:
Please, take a look at this screenshot:
I'm not a Windows user, so take my answer with a grain of salt. According to the Windows PowerShell Cookbook, PowerShell preprocesses the output of git diff
, splitting it in lines. Documentation of the Out-File
Cmdlet suggests, that >
is the same as | Out-File
without parameters. We also find this comment in the PowerShell documentation:
The results of using the Out-File cmdlet may not be what you expect if you are used to traditional output redirection. To understand its behavior, you must be aware of the context in which the Out-File cmdlet operates.
By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:
[...]
Out-file formats file contents to look like console output. This causes the output to be truncated just as it is in a console window in most circumstances. [...]
To get output that does not force line wraps to match the screen width, you can use the Width parameter to specify line width.
So, apparently it is not Git which chooses the character encoding, but Out-File
. This suggests a) that PowerShell redirection really should only be used for text and b) that
| Out-File -encoding ASCII -Width 2147483647 my.patch
will avoid the encoding problems. However, this still does not solve the problem with Windows vs. Unix line-endings . There are Cmdlets (see the PowerShell Community Extensions) to do conversion of line-endings.
However, all this recoding does not increase my confidence in a patch (which has no encoding itself, but is just a string of bytes). The aforementioned Cookbook contains a script Invoke-BinaryProcess, which can be used redirect the output of a command unmodified.
To sidestep this whole issue, an alternative would be to use git format-patch
instead of git diff
. format-patch
writes directly to a file (and not to stdout), so its output is not recoded. However, it can only create patches from commits, not arbitrary diffs.
format-patch
takes a commit range (e.g. master^10..master^5
) or a single commit (e.g. X, meaning X..HEAD) and creates patch files of the form NNNN-SUBJECT.patch, where NNNN is an increasing 4-digit number and subject is the (mangled) subject of the patch. An output directory can be specified with -o
.
If you use powershell you can also just do:
cmd /c "git diff > patch.diff"
This makes command to be run through CMD which writes to output file as is.
In case this helps anyone, using the good old Command Prompt instead of PowerShell works flawlessly; it doesn't seem to suffer from any of the issues present in PowerShell in regards to character encoding and EOLs.
Doing dos2unix on the diff generated on powershell seems to do the trick for me. I was then able to apply
the diff successfully.
dos2unix.exe diff_file
git apply diff_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With