Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting readable diff displays in Mercurial on Unicode files (MS Windows)

I'm trying to store some Windows PowerShell scripts in a Mercurial repository. It seems the PowerShell editor likes to save files as UTF-16 Unicode. This means that there are lots of \0 bytes, which is what Mercurial uses to distinguish between "text" and "binary" files. I understand that this makes no difference to how Mercurial stores the data, but it does mean that it displays binary diffs, which are kind of hard to read. Is there a way to tell Mercurial that these really are text files? Presumably I would need to convince Mercurial to use an external Unicode-aware diff program for particular file types.

like image 473
Dan Menes Avatar asked Jun 10 '10 14:06

Dan Menes


2 Answers

If my other answer does not do what you want, I think this one may; although I haven't tested it on Windows at all yet, it's working well in Linux. It does what is potentially a nasty thing, in wrapping mercurial.mdiff.unidiff with a new function which converts utf-16le to utf-8. This will not affect hg st, but will affect hg diff. One potential pitfall is that the BOM will also be changed from UTF-16LE BOM to the UTF-8 BOM.

Anyway, I think it may be useful to you, so here it is.

Extension file utf16decodediff.py:

import codecs
from mercurial import mdiff

unidiff = mdiff.unidiff

def new_unidiff(a, ad, b, bd, fn1, fn2, r=None, opts=mdiff.defaultopts):
    """
    A simple wrapper around mercurial.mdiff.unidiff which first decodes
    UTF-16LE text.
    """

    if a.startswith(codecs.BOM_UTF16_LE):
        try:
            # Gets reencoded as utf-8 to be a str rather than a unicode; some
            # extensions may expect a str and may break if it's wrong.
            a = a.decode('utf-16le').encode('utf-8')
        except UnicodeDecodeError:
            pass

    if b.startswith(codecs.BOM_UTF16_LE):
        try:
            b = b.decode('utf-16le').encode('utf-8')
        except UnicodeDecodeError:
            pass

    return unidiff(a, ad, b, bd, fn1, fn2, r, opts)

mdiff.unidiff = new_unidiff

In .hgrc:

[extensions]
utf16decodediff = ~/.hgexts/utf16decodediff.py

(Or equivalent paths.)

like image 89
Chris Morgan Avatar answered Oct 31 '22 14:10

Chris Morgan


I have worked around this by creating a new file with NotePad++ and saving it as a PowerShell file (.ps1 extension). NotePad++ will create the file as a plain text ANSI file. Once created I can open the file in the PowerShell editor and make any changes as necessary without the editor modifying the file encoding.

Disclaimer: I encountered this just moments ago and so I am not sure if there are any repercussions but so far my scripts appear to work as normal and my diffs are showing up nicely.

like image 21
Ryan Taylor Avatar answered Oct 31 '22 12:10

Ryan Taylor