Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I tell TortoiseHg to display a UTF-16 file as non-binary?

In a Microsoft Access 2007 project the Access form objects are exported to files with a dedicated software by using the built-in function "SaveAsText". This is necessary because Access doesn't store any of it's code modules in isolated files at its own.

The file starts with the bytes "FF FE" (which is UTF-16 according to http://de.wikipedia.org/wiki/Byte_Order_Mark). I presume because of many NUL characters in this file, Hg treats this file as a binary file. Hence the diff pane in the TortoiseHG workbench always tells

File or diffs not displayed: File is binary.

which is quite understandable under this assumption. But nevertheless this file is just usual source code. I can view it for example in Windows' notepad without any problems.

Is there any way to tell Mercurial, that this particular file should be treated as text, not binary?

Edit: Additionally to the marked preferred answer below I decided not to change the saving behaviour, but to use the "Visual Diff" command (select file, then press Ctrl+d) instead.

like image 803
Christoph Jüngling Avatar asked Jul 04 '11 15:07

Christoph Jüngling


2 Answers

I'm guessing that you frequently or occasionally export the form objects in order to track source code changes.

The only way to convince Mercurial that a file is not binary is to avoid NUL bytes.

You may want to convert the source code files to ASCII (or maybe ANSI) encoding as an additional step in your export in order to avoid the NUL bytes. If the source code files contain Unicode characters, you might try UTF-8, as this will only do multi-byte characters when necessary and single-byte characters otherwise, thus avoiding NUL bytes again. I tried it out briefly and Mercurial handles UTF-8: it doesn't show "File is binary", but the actual diff. I committed on the commandline, but viewed the diff in TortoiseHg. I have a link about commandline encoding challenges below.

The hgrc encode/decode sections might be particularly useful in helping to filter the UTF-16 files into something that works better.

A couple other pages on Mercurial and encoding:

  • Character Encoding On Windows
  • Encoding Strategy

TortoiseHg 2.1 + Mercurial 1.9

like image 142
Joel B Fant Avatar answered Sep 21 '22 19:09

Joel B Fant


From https://www.mercurial-scm.org/wiki/BinaryFiles:

The question naturally arises, what is a binary file anyway? It turns out there's really no good answer to this question, so Mercurial uses the same heuristic that programs like diff(1) use. The test is simply if there are any NUL bytes in a file.

For diff, export, and annotate, this will get things right almost all of the time and it will not attempt to process files it thinks are binary. If necessary, you can force these commands to treat files as text with -a.

like image 26
Graham Borland Avatar answered Sep 20 '22 19:09

Graham Borland