I have a text file that claims to be UTF-8 encoded. That is, when i call file -I $file
it prints $file: text/plain; charset=utf-8
. But when I open it with UTF-8 encoding some characters seem corrupted. That is, the file is suppose to be german but the special german characters like ö
are displayed as ö
.
I guessed that the claim to be UTF-8 is wrong and executed the enca script to guess the real encoding. But sadly enca tells me that the language de
(german) is not supported.
Is there another way to fix the file?
The UTF-8 encoded form of âöâ U+00F6 is 0xC3 0xB6, and if these bytes are interpreted in ISO-8859-1 they are âöâ (U+00C3 U+00B6). So either the file is actually being read and interprered as ISO-8859-1, even though you expect otherwise, or there has been a double encoding: previously, the file or part thereof has been read as if it were ISO-8859-1 (even though it was UTF-8), and the misinterpreted data has then been written out as UTF-8 encoded.
To get a file to read properly in a given encoding, you need three things:
Note that (2) is not strictly necessary, but if the file encoding is detected improperly, you will need to manually re-read the file in the correct encoding. For example, using :e ++enc=utf-8
for a utf-8 file that was not detected as such.
See http://vim.wikia.com/wiki/Working_with_Unicode for getting all three of these concepts correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With