How to fix wrong text file encoding?

Question

I have a text file that claims to be UTF-8 encoded. That is, when i call file -I $file it prints $file: text/plain; charset=utf-8. But when I open it with UTF-8 encoding some characters seem corrupted. That is, the file is suppose to be german but the special german characters like Ã¶ are displayed as ÃÂ¶.

I guessed that the claim to be UTF-8 is wrong and executed the enca script to guess the real encoding. But sadly enca tells me that the language de (german) is not supported.

Is there another way to fix the file?

Jukka K. Korpela · Accepted Answer

The UTF-8 encoded form of âÃ¶â U+00F6 is 0xC3 0xB6, and if these bytes are interpreted in ISO-8859-1 they are âÃÂ¶â (U+00C3 U+00B6). So either the file is actually being read and interprered as ISO-8859-1, even though you expect otherwise, or there has been a double encoding: previously, the file or part thereof has been read as if it were ISO-8859-1 (even though it was UTF-8), and the misinterpreted data has then been written out as UTF-8 encoded.

Ben · Answer

To get a file to read properly in a given encoding, you need three things:

'encoding' which controls the characters Vim can store and display must be able to represent all the characters in your file.
'fileencodings' which controls which encodings Vim will attempt to recognize must be set in a way that your file encoding is recognized
'fileencoding' must be set properly, normally by being automatically detected by the 'fileencodings' setting, to the encoding your file is stored in.

Note that (2) is not strictly necessary, but if the file encoding is detected improperly, you will need to manually re-read the file in the correct encoding. For example, using :e ++enc=utf-8 for a utf-8 file that was not detected as such.

See http://vim.wikia.com/wiki/Working_with_Unicode for getting all three of these concepts correct.

How to fix wrong text file encoding?

Tags:

vim

character-encoding

encoding

utf-8

katosh

2 Answers

Jukka K. Korpela

Ben

Recent Activity

Donate For Us

How to fix wrong text file encoding?

Tags:

vim

character-encoding

encoding

utf-8

katosh

2 Answers

Jukka K. Korpela

Ben

Related questions

Recent Activity

Donate For Us