I'm currently rewriting (a part of) the printf()
function for a school project.
Overall, we were required to reproduce the behaviour of the function with several flags, conversions, length modifiers ...
The only thing I have left to do and that gets me stuck are the flags %C
/ %S
(or %lc
/ %ls
).
So far, I've gathered that wchar_t
is a type that can store characters on more than one byte, in order to accept more characters or symbols and therefore be compatible with pretty much every language, regardless of their alphabet and special characters.
However, I wasn't able to find any concrete information on what a wchar
looks like for the machine, it's actual length (which apparently vary based on several factors including the compiler, the OS ...) or how to actually write them.
Thank you in advance
Note that we are limited in the functions we are allowed to use. The only allowed functions are write()
, malloc()
, free()
, and exit()
.
We must be able to code any other required function ourselves.
To sum this up, what I'm asking here is some informations on how to interpret and write "manually" any wchar_t
character, with as little code as possible so that I can try to understand the whole process and code it myself.
The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems.
A wide character is a computer character datatype that generally has a size greater than the traditional 8-bit character. The increased datatype size allows for the use of larger coded character sets.
Note that on AIX a wchar_t is 2 bytes.
A wide string literal is a null-terminated array of constant wchar_t that is prefixed by ' L ' and contains any graphic character except the double quotation mark ( " ), backslash ( \ ), or newline character. A wide string literal may contain the escape sequences listed above and any universal character name.
A wchar_t
is similar to a char in the sense that it is a number, but when displaying a char
or wchar_t
we don't want to see the number, but the drawn character corresponding to the number. The mapping from the number to the characters aren't defined by neither char
nor wchar_t
, they depend on the system. So there is no difference in the end usage between char
and wchar_t
except for their sizes.
Given the above, the most trivial implementation of printf("%ls")
is one where you know what are the system encodings for use with char
and wchar_t
. For example, in my system, char
has 8 bits, has encoding UTF-8, while wchar_t
is 32 bits and has encoding UTF-32. So the printf implementation just converts from UTF-32 to UTF-8 and outputs the result.
A more general implementation must support different and configurable encodings and may need to inspect what's the current encoding. In this case functions like wcsnrtombs()
or iconv()
must be used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With