I have been given some HTML files that use the Mac OS Roman file encoding. The files have French text, but in an editor many of the diacritical chars look strange (i.e. non French)
Si cette option est sÈlectionnÈe, <removed> tentera de communiquer avec votre tÈlescope seulement ‡ líaide díun ...
The capital E with accent does display properly in the browser as é as do the other strange characters.
I also have some UTF-8 French files that look normal in an editor (é looks like é). What I'd like to do is convert all the Mac Roman files to UTF-8 for easier maintenance.
Simply changing the file encoding in the editor doesn't do this. The strange characters are still strange.
Short of making a conversion dictionary and doing a Find/Replace on all the files, is there a way to do this?
If your editor isn’t showing it correctly when you specify the encoding, you have given it the wrong encoding. You need to figure what encoding you really have.
You appear to have a byte valued 0xE9 where you need a Unicode LATIN SMALL LETTER E WITH ACUTE
character. A MacRoman 0xE9 byte is a LATIN CAPITAL LETTER E WITH GRAVE
character, which is what your editor is displaying because you said it was MacRoman. But it is not.
However, Unicode code point U+00E9 is indeed LATIN SMALL LETTER E WITH ACUTE
.
Therefore, it is not MacRoman that you have there, but almost certainly ISO-8859-1 or ISO-8859-15.
So use something like
$ iconv -f ISO-8859-1 -t UTF-8 < input.latin1 > output.utf8
to do the conversion.
To actually answer the question "Converting Mac Roman character to equivalent UTF-8"
Convert the encoding of the file from Mac OS Roman to UTF-8:
$ iconv -f macintosh -t UTF-8 < INPUT_FILE_PATH > OUTPUT_FILE_PATH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With