What is the string terminator sequence for a UTF-16 string?
EDIT:
Let me rephrase the question in an attempt to clarify. How's does the call to wcslen()
work?
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article).
Yes, UTF-8 defines 0x0 as NUL .
No, but if you say temp. c_str() a null terminator will be included in the return from this method. It's also worth saying that you can include a null character in a string just like any other character.
The null terminated strings are basically a sequence of characters, and the last element is one null character (denoted by '\0'). When we write some string using double quotes (“…”), then it is converted into null terminated strings by the compiler.
Unicode does not define string terminators. Your environment or language does. For instance, C strings use 0x0 as a string terminator, as well as in .NET strings where a separate value in the String
class is used to store the length of the string.
To answer your second question, wcslen
looks for a terminating L'\0'
character. Which as I read it, is any length of 0x00
bytes, depending on the compiler, but will likely be the two-byte sequence 0x00
0x00
if you're using UTF-16 (encoding U+0000, 'NUL')
7.24.4.6.1 The wcslen function (from the Standard)
...
[#3] The wcslen function returns the number of wide characters that precede the terminating null wide character.
And the null wide character is L'\0'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With