I came across this in the book: <pre class="prettyprint"><code>wscanf(L"%lf", &variable); </code></pre> where the first parameter is of type of <code>wchar_t *</code>. This s different from <code>scanf("%lf", &variable);</code> where the first parameter is of type <code>char *</code>. So what is the difference than. I have never heard "wide character string" before. I have heard something called Raw String Literals which is printing the string as it is (no need for things like escape sequences) but that was not in C.

"Wide character string" is referring to the encoding of the characters in the string. From Wikipedia: <blockquote> A wide character is a computer character datatype that generally has a size greater than the traditional 8-bit character. The increased datatype size allows for the use of larger coded character sets. </blockquote> UTF-16 is one of the most commonly used wide character encodings. Further, <code>wchar_t</code> is defined by Microsoft as an <code>unsigned short(16-bit)</code> data object. This could be and is most likely a different definition in other operating systems or languages. Taken from the Wikipedia article from the comment below: <blockquote> "The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers." </blockquote>

What is a "wide character string" in C language?

Tags:

c

string

widechar

I came across this in the book:

wscanf(L"%lf", &variable);

where the first parameter is of type of wchar_t *.

This s different from scanf("%lf", &variable); where the first parameter is of type char *.

So what is the difference than. I have never heard "wide character string" before. I have heard something called Raw String Literals which is printing the string as it is (no need for things like escape sequences) but that was not in C.

696

asked Jul 02 '12 02:07

quantum231

2 Answers

The exact nature of wide characters is (purposefully) left implementation defined.

When they first invented the concept of wchar_t, ISO 10646 and Unicode were still competing with each other (whereas they now, mostly cooperate). Rather than try to decree that an international character would be one or the other (or possibly something else entirely) they simply provided a type (and some functions) that the implementation could define to support international character sets as they chose.

Different implementations have exercised that potential for variation. For example, if you use Microsoft's compiler on Windows, wchar_t will be a 16-bit type holding UTF-16 Unicode (originally it held UCS-2 Unicode, but that's now officially obsolete).

On Linux, wchar_t will more often be a 32-bit type, holding UCS-4/UTF-32 encoded Unicode. Ports of gcc to at least some other operating systems do the same, though I've never tried to confirm that it's always the case.

There is, however, no guarantee of that. At least in theory an implementation on Linux could use 16 bits, or one on Windows could use 32 bits, or either one could decide to use 64 bits (though I'd be a little surprised to see that in reality).

In any case, the general idea of how things are intended to work, is that a single wchar_t is sufficient to represent a code point. For I/O, the data is intended to be converted from the external representation (whatever it is) into wchar_ts, which (is supposed to) make them relatively easy to manipulate. Then during output, they again get transformed into the encoding of your choice (which may be entirely different from the encoding you read).

answered Sep 19 '22 10:09

Jerry Coffin

"Wide character string" is referring to the encoding of the characters in the string.

From Wikipedia:

A wide character is a computer character datatype that generally has a size greater than the traditional 8-bit character. The increased datatype size allows for the use of larger coded character sets.

UTF-16 is one of the most commonly used wide character encodings.

Further, wchar_t is defined by Microsoft as an unsigned short(16-bit) data object. This could be and is most likely a different definition in other operating systems or languages.

Taken from the Wikipedia article from the comment below:

"The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers."

answered Sep 20 '22 10:09

Chris Dargis

Related questions
                            
                                Watch a memory range in gdb?
                            
                                How to create a new Linux kernel scheduler
                            
                                C sizeof a passed array [duplicate]
                            
                                Using void (*)() pointers for other functions
                            
                                Is the order of writes to separate members of a volatile struct guaranteed to be preserved?
                            
                                C tail call optimization
                            
                                ANSI C vs other C standards
                            
                                When can argv[0] have null?
                            
                                Generating .dll using CMake
                            
                                What is the cause of flexible array member not at end of struct error?
                            
                                Adding leading underscores to assembly symbols with GCC on Win32?
                            
                                Sharing memory between two processes (C, Windows)
                            
                                C compound literals, pointer to arrays
                            
                                C - why is strcpy() necessary
                            
                                Is changing a pointer considered an atomic action in C?
                            
                                gcc /usr/bin/ld: error: cannot find -lncurses
                            
                                What's missing/sub-optimal in this memcpy implementation?
                            
                                shared c constants in a header
                            
                                Why 1103515245 is used in rand?
                            
                                What is a Kernel thread?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With