Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-16 string terminator

What is the string terminator sequence for a UTF-16 string?

EDIT:

Let me rephrase the question in an attempt to clarify. How's does the call to wcslen() work?

like image 528
Ray Avatar asked May 07 '11 20:05

Ray


People also ask

What is Terminator string?

In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article).

Are UTF-8 strings null-terminated?

Yes, UTF-8 defines 0x0 as NUL .

Do all strings have null terminators?

No, but if you say temp. c_str() a null terminator will be included in the return from this method. It's also worth saying that you can include a null character in a string just like any other character.

How do you null terminate a string?

The null terminated strings are basically a sequence of characters, and the last element is one null character (denoted by '\0'). When we write some string using double quotes (“…”), then it is converted into null terminated strings by the compiler.


2 Answers

Unicode does not define string terminators. Your environment or language does. For instance, C strings use 0x0 as a string terminator, as well as in .NET strings where a separate value in the String class is used to store the length of the string.

To answer your second question, wcslen looks for a terminating L'\0' character. Which as I read it, is any length of 0x00 bytes, depending on the compiler, but will likely be the two-byte sequence 0x00 0x00 if you're using UTF-16 (encoding U+0000, 'NUL')

like image 99
Michael Petrotta Avatar answered Oct 21 '22 18:10

Michael Petrotta


7.24.4.6.1 The wcslen function (from the Standard)

...

   [#3]   The  wcslen  function  returns  the  number  of  wide
   characters that precede the terminating null wide character.

And the null wide character is L'\0'

like image 43
pmg Avatar answered Oct 21 '22 16:10

pmg