I am brushing up on my C++ and stumbled across a curious behavior in regards to strings, character arrays, and the null character ('\0'
). The following code:
#include <iostream>
using namespace std;
int main() {
cout << "hello\0there"[6] << endl;
char word [] = "hello\0there";
cout << word[6] << endl;
string word2 = "hello\0there";
cout << word2[6] << endl;
return 0;
}
produces the output:
> t
> t
>
What is going on behind the scenes? Why does the string literal and the declared char array store the 't'
at index 6 (after the internal '\0'
), but the declared string does not?
Strings are actually one-dimensional array of characters terminated by a null character '\0'. Thus a null-terminated string contains the characters that comprise the string followed by a null.
A null string has no values. It's an empty char array, one that hasn't been assigned any elements. The string exists in memory, so it's not a NULL pointer. It's just absent any elements. An empty string has a single element, the null character, '\0' .
Most string-manipulating functions relies on NULL to know when the string is finished (and its job is done), and won't work with simple char-array (eg. they'll keep on working past the boundaries of the array, and continue until it finds a NULL somewhere in memory - often corrupting memory as it goes).
From what I remember, the first two are in essence just an array and the way a string is printed is to continue to print until a \0
is encounterd. Thus in the first two examples you start at the point offset of the 6th character in the string, but in your case you are printing out the 6th character which is t
.
What happens with the string
class is that it makes a copy of the string into it's own internal buffer and does so by copying the string from the start of the array up to the first \0
it finds. Thus the t
is not stored because it comes after the first \0
.
Because the std::string
constructor that takes a const char*
treats its argument as a C-style string. It simply copies from it until it hits a null-terminator, then stops copying.
So your last example is actually invoking undefined behaviour; word2[6]
goes past the end of the string.
You are constructing a string from a char*
(or something that decayed to that). This means that the convention for C-strings apply. That is they are '\0'
terminated. That's why word2
only contains "hello"
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With