Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix wrong text file encoding?

I have a text file that claims to be UTF-8 encoded. That is, when i call file -I $file it prints $file: text/plain; charset=utf-8. But when I open it with UTF-8 encoding some characters seem corrupted. That is, the file is suppose to be german but the special german characters like ö are displayed as ö.

I guessed that the claim to be UTF-8 is wrong and executed the enca script to guess the real encoding. But sadly enca tells me that the language de (german) is not supported.

Is there another way to fix the file?

like image 649
katosh Avatar asked Jan 11 '23 05:01

katosh


2 Answers

The UTF-8 encoded form of “ö” U+00F6 is 0xC3 0xB6, and if these bytes are interpreted in ISO-8859-1 they are “ö” (U+00C3 U+00B6). So either the file is actually being read and interprered as ISO-8859-1, even though you expect otherwise, or there has been a double encoding: previously, the file or part thereof has been read as if it were ISO-8859-1 (even though it was UTF-8), and the misinterpreted data has then been written out as UTF-8 encoded.

like image 151
Jukka K. Korpela Avatar answered Jan 17 '23 13:01

Jukka K. Korpela


To get a file to read properly in a given encoding, you need three things:

  1. 'encoding' which controls the characters Vim can store and display must be able to represent all the characters in your file.
  2. 'fileencodings' which controls which encodings Vim will attempt to recognize must be set in a way that your file encoding is recognized
  3. 'fileencoding' must be set properly, normally by being automatically detected by the 'fileencodings' setting, to the encoding your file is stored in.

Note that (2) is not strictly necessary, but if the file encoding is detected improperly, you will need to manually re-read the file in the correct encoding. For example, using :e ++enc=utf-8 for a utf-8 file that was not detected as such.

See http://vim.wikia.com/wiki/Working_with_Unicode for getting all three of these concepts correct.

like image 25
Ben Avatar answered Jan 17 '23 15:01

Ben