This is an ANSI C question. I have the following code.
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main()
{
if (!setlocale(LC_CTYPE, "")) {
printf( "Can't set the specified locale! "
"Check LANG, LC_CTYPE, LC_ALL.\n");
return -1;
}
wint_t c;
while((c=getwc(stdin))!=WEOF)
{
printf("%lc",c);
}
return 0;
}
I need full UTF-8 support, but even at this simplest level, can I improve this somehow? Why is wint_t
used, and not wchar
, with appropriate changes?
wint_t
is capable of storing any valid value of wchar_t
. A wint_t
is also capable of taking on the result of evaluating the WEOF
macro (note that a wchar_t
might be too narrow to hold the result).
As @musiphil so nicely put in his comment, which I'll try to expand here, there is a conceptual difference between wint_t
and wchar_t
.
Their different sizes are a technical aspect that derives from the fact each has very distinct semantics:
wchar_t
is large enough to store characters, or codepoints if you prefer. As such, they are unsigned. They are analogous to char
, which was, in virtually all platforms, limited to 8-bit 256 values. So wide-char strings variables are naturally arrays or pointers of this type.
Now enter string functions, some of which need to be able to return any wchar_t
plus additional statuses. So their return type must be larger than wchar_t
. So wint_t
is used, which can express any wide char and also WEOF
. Being a status, it can also be negative (and usually is), hence wint_t
is most likely signed. I say "possibly" because the C standard does not mandate it to be. But regardless of sign, status values need to be outside the range of wchar_t
. They are only useful as return vales, and never meant to store such characters.
The analogy with "classic" char
and int
is great to clear any confusion: strings are not of type int []
, they are char var[]
(or char *var
). And not because char
is "half the size of int
", but because that's what a string is.
Your code looks correct: c
is used to check the result of getwch()
so it is wint_t
. And if its value is not WEOF
, as your if
tests, then it's safe to assign it to a wchar_t
character (or a string array, pointer, etc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With