Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rationale of fileencoding and encoding in vim or elsewhere

I don't get the point why there are encoding and also fileencoding in VIM.

In my knowledge, a file is like an array of bytes. When we create a text file, we create an array of characters (or symbols), and encode this character-array with encoding X to an array of bytes, and save the byte-array to disk. When read in text editor, it decode the byte-array with encoding X to reconstruct the original character-array, and display each character with a graph according to the font. In this process, only one encoding involved.

In VIM set encoding and fileencoding utf-8, which refers wiki of VIM about working with unicode,

encoding sets how vim shall represent characters internally. Utf-8 is necessary for most flavors of Unicode.

fileencoding sets the encoding for a particular file (local to buffer)

"How vim shall represent characters internally" vs "encoding for a particular file"... resambles Unicode vs UTF-8? If so, why should a user bother with the former?

Any hint?

like image 913
Frozen Flame Avatar asked Dec 20 '22 17:12

Frozen Flame


1 Answers

You're right; most programs have a fixed internal encoding (speaking of C datatypes, that's either char, which mostly then uses the underlying locale and may not be able to represent all characters, or UTF-8; or wchar (wide characters) which can represent the Unicode range). The choice is mainly driven by programming language and available APIs (as having to convert back and forth is tedious and not efficient).

Vim, because it supports a large variety of platforms (starting with the old Amiga where development started) and is geared towards programmers and highly advanced users allows to configure the internal representation.

heuristics

  • As long as all characters are recognizable, you don't need to care.
  • If certain files don't look right, you have to teach Vim to recognize the encoding via 'fileencodings', or explicitly specify it.
  • If certain characters do not show up right, you need to switch the 'encoding'. With utf-8, you're on the safe side.
  • If you have problems in the terminal only, fiddle with 'termencoding'.

As you can see, though it can be confusing to the beginner, you actually have all the power available to you!

like image 114
Ingo Karkat Avatar answered Dec 22 '22 08:12

Ingo Karkat