Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the size of this std::string change, when characters are changed?

I have an issue in which the size of the string is effected with the presence of a '\0' character. I searched all over in SO and could not get the answer still.

Here is the snippet.

int main() {   std::string a = "123123\0shai\0";   std::cout << a.length(); } 

http://ideone.com/W6Bhfl

The output in this case is

6 

Where as the same program with a different string having numerals instead of characters

int main() {   std::string a = "123123\0123\0";   std::cout << a.length(); } 

http://ideone.com/mtfS50

gives an output of

8 

What exactly is happening under the hood? How does presence of a '\0' character change the behavior?

like image 377
samairtimer Avatar asked Nov 08 '16 07:11

samairtimer


People also ask

Why is the size of character arrays declared one more than the largest string they can hold?

The maximum index value of most arrays, therefore, is one less than its numerical value. It's same with a string, but since it has an extra character at the end, it gets incremented by one. So, the string length is the same as the number of characters in it.

Does std::string store size?

While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation. To achieve this libc++ uses a neat trick: the size of the string is not saved as-is but rather in a special way: if the string is short (< 23 bytes) then it stores size() * 2 .

Does std::string allocate?

The object str (it is the instance of the class std::string ) is allocated in the stack. However, the string data itself MAY BE allocated in the heap. It means the object has an internal pointer to a buffer that contains the actual string.

What does std::string () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.


2 Answers

The sequence \012 when used in a string (or character) literal is an octal escape sequence. It's the octal number 12 which corresponds to the ASCII linefeed ('\n') character.

That means your second string is actually equal to "123123\n3\0" (plus the actual string literal terminator).

It would have been very clear if you tried to print the contents of the string.

Octal sequences are one to three digits long, and the compiler will use as many digits as possible.

like image 86
Some programmer dude Avatar answered Sep 26 '22 03:09

Some programmer dude


If you check the coloring at ideone you will see that \012 has a different color. That is because this is a single character written in octal.

like image 30
Bo Persson Avatar answered Sep 25 '22 03:09

Bo Persson