Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding C++ strings

Tags:

c++

string

c++11

I'm trying to understand how strings really work in C++ because I just got really confused after coming across an unexpected behavior.

Considering a string, I insert a character (not using append()) using [] operator:

string str;
str[0] = 'a';

Let's print the string:

cout << "str:" << str << endl;

I get NULL as output:

str:

Ok, let's try printing the only character in the string:

cout << "str[0]:" << str[0] << endl;

Output:

str[0]:a

Q1. What happened there? Why was a not printed in the first case?

Now, I do something that should throw a compilation error but it doesn't and my question is again, why.

str = 'ABC';

Q2. How's that not an incorrect semantic i.e. assigning a character (which is not really a character but essentially a string in single quotes) to a string?

Now, worse when I print the string, it always prints last character i.e C (I was expecting first character i.e. A):

cout << "str:" << str << endl;

Output:

str:C

Q3. Why was the last character printed, not first?

like image 687
Duh Avatar asked Dec 24 '22 22:12

Duh


1 Answers

Considering a string, I insert a character (not using append()) using [] operator:

string str;
str[0] = 'a';

You did not insert a character. operator[](size_type pos) returns a reference to the - already existing - character at pos. If pos == size() then behaviour is undefined. Your string is empty, so size() == 0 and therefore str[0] has undefined behaviour.

Q1. What happened there? Why was a not printed in the first case?

The behaviour is undefined.


Now, I do something that should throw a compilation error but it doesn't and my question is again, why.

str = 'ABC';

Q2. How's that not an incorrect semantic i.e. assigning a character ... to a string?

Assigning a character to a string is not incorrect semantic. It sets the content of the string to that single character.

Q2. ... a character (which is not really a character but essentially a string in single quotes) ...

It is a multicharacter literal. The type of a multicharacter literal is int. If the compiler supports multicharacter literals, then the semantic is not incorrect.

There isn't an assignment operator for string that would accept an int. However, int is implicitly convertible to char, so the assignment operator that accepts a char is used after the conversion.

char cannot necessarily represent all the values that int can, so it is possible that the conversion overflows. If char is a signed type, then this overflow has undefined behaviour.


Q3. Why was the last character printed, not first?

The value of a multicharacter literal is implementation-defined. You'll need to consult the manual of your compiler to find out whether multicharacter literals are supported, and what value you should expect. Furthermore, you'll need to consider the fact that the char that the value is converted to probably cannot represent all values of int.


but I didn't get any warnings

Then consider getting a better compiler. This is what GCC warns:

warning: multi-character character constant [-Wmultichar]

 str = 'ABC';

warning: overflow in implicit constant conversion [-Woverflow]


str[0] = 'a' should work with string just like it does with char str[] = "" (but it doesn't as we saw). Can you help me understand why [] operator has different behavior in dealing with array of characters than string?

Because that's how the standard has defined the behaviour and requirements of std::string.

char str[] = "";

Creates an array of size 1, consisting of the null terminator. This element of the array is like any other, and you can freely modify it:

str[0] = 'a';

This is well defined and OK. But now str no longer contains a null-terminated string, so trying to use it as such has undefined behaviour:

out << "str:" << str << endl; // oops, str is not a null terminated string

So, std::string has been designed such that you cannot mess with the final null terminator - as long as you obey the requirements of std::string. Not allowing touching the null terminator also allows the implementation to never allocate a memory buffer for an empty string. Not allocating memory may be faster than allocating memory, so this is a good thing.

like image 142
eerorika Avatar answered Jan 08 '23 15:01

eerorika