Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is utf-8 coded string printed to screen in C with printf?

For below code in C:

char s[] = "这个问题";
printf("%s", s);

Knew that source file is "UTF-8 Unicode C program text" with file command.

How the string is coded after compile? Also utf-8 in the .out file?

When the binary file executed in bash, how the string is coded in memory? Is it also utf-8?

Then, how bash knows the coding scheme and show right character?

Last, now the bash know what to show, but how bytes translated to pixels on the screen? Is there some mapping from bytes to pixels?

In all these processes, is there any encoding or decoding of utf-8?

like image 767
heLomaN Avatar asked Feb 26 '16 09:02

heLomaN


People also ask

What is encoding =' UTF-8?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

Does C use UTF-8?

Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.

Does STD string support UTF-8?

UTF-8 actually works quite well in std::string . Most operations work out of the box because the UTF-8 encoding is self-synchronizing and backward compatible with ASCII.


1 Answers

Assuming GCC, this manual page says that the preprocessor will first translate the character set of the incoming files to the so called source character set, which for gcc is UTF-8. So for an UTF-8 file, nothing happens. The default execution character set is then used for string constants, and that is (again, for GCC) UTF-8 by default.

So your UTF-8 string "survives" and exists in the executable as a bunch of bytes in UTF-8 encoding.

The terminal also has a character set, and that has to match, the C program does nothing to further translate strings when printed, they're just printed as they are, byte for byte. If the terminal isn't set for UTF-8, you will just get garbage.

As I noted in a comment, bash has nothing to do with this.

like image 107
unwind Avatar answered Oct 10 '22 21:10

unwind