Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wchar_t vs wint_t

Tags:

c

string

This is an ANSI C question. I have the following code.

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

  int main()
  {
    if (!setlocale(LC_CTYPE, "")) {
      printf( "Can't set the specified locale! "
              "Check LANG, LC_CTYPE, LC_ALL.\n");
      return -1;
    }
    wint_t c;
    while((c=getwc(stdin))!=WEOF)
      {
    printf("%lc",c);
      }
    return 0;
  }

I need full UTF-8 support, but even at this simplest level, can I improve this somehow? Why is wint_t used, and not wchar, with appropriate changes?

like image 382
Dervin Thunk Avatar asked Jul 04 '09 04:07

Dervin Thunk


2 Answers

wint_t is capable of storing any valid value of wchar_t. A wint_t is also capable of taking on the result of evaluating the WEOF macro (note that a wchar_t might be too narrow to hold the result).

like image 112
Brandon E Taylor Avatar answered Sep 30 '22 09:09

Brandon E Taylor


As @musiphil so nicely put in his comment, which I'll try to expand here, there is a conceptual difference between wint_t and wchar_t.

Their different sizes are a technical aspect that derives from the fact each has very distinct semantics:

  • wchar_t is large enough to store characters, or codepoints if you prefer. As such, they are unsigned. They are analogous to char, which was, in virtually all platforms, limited to 8-bit 256 values. So wide-char strings variables are naturally arrays or pointers of this type.

  • Now enter string functions, some of which need to be able to return any wchar_t plus additional statuses. So their return type must be larger than wchar_t. So wint_t is used, which can express any wide char and also WEOF. Being a status, it can also be negative (and usually is), hence wint_t is most likely signed. I say "possibly" because the C standard does not mandate it to be. But regardless of sign, status values need to be outside the range of wchar_t. They are only useful as return vales, and never meant to store such characters.

The analogy with "classic" char and int is great to clear any confusion: strings are not of type int [], they are char var[] (or char *var). And not because char is "half the size of int", but because that's what a string is.

Your code looks correct: c is used to check the result of getwch() so it is wint_t. And if its value is not WEOF, as your if tests, then it's safe to assign it to a wchar_t character (or a string array, pointer, etc)

like image 24
MestreLion Avatar answered Sep 30 '22 08:09

MestreLion