I'm currently rewriting (a part of) the <code>printf()</code> function for a school project. Overall, we were required to reproduce the behaviour of the function with several flags, conversions, length modifiers ... The only thing I have left to do and that gets me stuck are the flags <code>%C</code> / <code>%S</code> (or <code>%lc</code> / <code>%ls</code>). So far, I've gathered that <code>wchar_t</code> is a type that can store characters on more than one byte, in order to accept more characters or symbols and therefore be compatible with pretty much every language, regardless of their alphabet and special characters. However, I wasn't able to find any concrete information on what a <code>wchar</code> looks like for the machine, it's actual length (which apparently vary based on several factors including the compiler, the OS ...) or how to actually write them. Thank you in advance Note that we are limited in the functions we are allowed to use. The only allowed functions are <code>write()</code>, <code>malloc()</code>, <code>free()</code>, and <code>exit()</code>. We must be able to code any other required function ourselves. To sum this up, what I'm asking here is some informations on how to interpret and write "manually" any <code>wchar_t</code> character, with as little code as possible so that I can try to understand the whole process and code it myself.

A <code>wchar_t</code> is similar to a char in the sense that it is a number, but when displaying a <code>char</code> or <code>wchar_t</code> we don't want to see the number, but the drawn character corresponding to the number. The mapping from the number to the characters aren't defined by neither <code>char</code> nor <code>wchar_t</code>, they depend on the system. So there is no difference in the end usage between <code>char</code> and <code>wchar_t</code> except for their sizes. Given the above, the most trivial implementation of <code>printf("%ls")</code> is one where you know what are the system encodings for use with <code>char</code> and <code>wchar_t</code>. For example, in my system, <code>char</code> has 8 bits, has encoding UTF-8, while <code>wchar_t</code> is 32 bits and has encoding UTF-32. So the printf implementation just converts from UTF-32 to UTF-8 and outputs the result. A more general implementation must support different and configurable encodings and may need to inspect what's the current encoding. In this case functions like <code>wcsnrtombs()</code> or <code>iconv()</code> must be used.

Understanding and writing wchar_t in C

Tags:

c

printf

wchar-t

widechar

I'm currently rewriting (a part of) the printf() function for a school project. Overall, we were required to reproduce the behaviour of the function with several flags, conversions, length modifiers ...

The only thing I have left to do and that gets me stuck are the flags %C / %S (or %lc / %ls).

So far, I've gathered that wchar_t is a type that can store characters on more than one byte, in order to accept more characters or symbols and therefore be compatible with pretty much every language, regardless of their alphabet and special characters.

However, I wasn't able to find any concrete information on what a wchar looks like for the machine, it's actual length (which apparently vary based on several factors including the compiler, the OS ...) or how to actually write them.

Thank you in advance

Note that we are limited in the functions we are allowed to use. The only allowed functions are write(), malloc(), free(), and exit(). We must be able to code any other required function ourselves.

To sum this up, what I'm asking here is some informations on how to interpret and write "manually" any wchar_t character, with as little code as possible so that I can try to understand the whole process and code it myself.

484

asked Dec 10 '14 12:12

kRYOoX

1 Answers

A wchar_t is similar to a char in the sense that it is a number, but when displaying a char or wchar_t we don't want to see the number, but the drawn character corresponding to the number. The mapping from the number to the characters aren't defined by neither char nor wchar_t, they depend on the system. So there is no difference in the end usage between char and wchar_t except for their sizes.

Given the above, the most trivial implementation of printf("%ls") is one where you know what are the system encodings for use with char and wchar_t. For example, in my system, char has 8 bits, has encoding UTF-8, while wchar_t is 32 bits and has encoding UTF-32. So the printf implementation just converts from UTF-32 to UTF-8 and outputs the result.

A more general implementation must support different and configurable encodings and may need to inspect what's the current encoding. In this case functions like wcsnrtombs() or iconv() must be used.

158

answered Sep 17 '22 23:09

hdante

Related questions
                            
                                What is the best way to read from Linux /proc interfaces using C user space code?
                            
                                Passing NON-POD type to Variadic function is undefined behavior?
                            
                                Cast between struct pointer in C
                            
                                C function call with too few arguments
                            
                                How to read Mach-O header from object file?
                            
                                WM_USER vs WM_APP
                            
                                What are the differences between #pragma pack(push, n)/#pragma pack(pop) and __attribute__((__packed__, aligned(n) )) on GCC?
                            
                                Reductions in parallel in logarithmic time
                            
                                Can bitwise operators have undefined behavior?
                            
                                Portable equivalent to gcc's __attribute__(cleanup)
                            
                                C socket API is thread safe? [duplicate]
                            
                                GCC 4.4: Avoid range check on switch/case statement in gcc?
                            
                                Concatenate two char arrays?
                            
                                Preprocessor token expansion [duplicate]
                            
                                Check if a directory is empty using C on Linux
                            
                                compile errors using signal.h in Linux [duplicate]
                            
                                What is the usage of "!!" (negating twice)? [duplicate]
                            
                                open with O_CREAT - was it opened or created?
                            
                                Convert a dynamically linked elf binary to statically linked [closed]
                            
                                How programmatically get Linux process's stack start and end address?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With