Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Legal to overwrite std::string's null terminator?

In C++11, we know that std::string is guaranteed to be both contiguous and null-terminated (or more pedantically, terminated by charT(), which in the case of char is the null character 0).

There is this C API I need to use that fills in a string by pointer. It writes the whole string + null terminator. In C++03, I was always forced to use a vector<char>, because I couldn't assume that string was contiguous or null-terminated. But in C++11 (assuming a properly conforming basic_string class, which is still iffy in some standard libraries), I can.

Or can I? When I do this:

std::string str(length); 

The string will allocate length+1 bytes, with the last filled in by the null-terminator. That's good. But when I pass this off to the C API, it's going to write length+1 characters. It's going to overwrite the null-terminator.

Admittedly, it's going to overwrite the null-terminator with a null character. Odds are good that this will work (indeed, I can't imagine how it couldn't work).

But I don't care about what "works". I want to know, according to the spec, whether it's OK to overwrite the null-terminator with a null character?

like image 238
Nicol Bolas Avatar asked Oct 05 '12 06:10

Nicol Bolas


People also ask

Does std::string use null terminator?

Actually, as of C++11 std::string is guaranteed to be null terminated. Specifically, s[s. size()] will always be '\0' .

What happens if you don't null terminate a string?

Many library functions accept a string or wide string argument with the constraint that the string they receive is properly null-terminated. Passing a character sequence or wide character sequence that is not null-terminated to such a function can result in accessing memory that is outside the bounds of the object.

Does string end with null in C++?

In C the strings are basically array of characters. In C++ the std::string is an advancement of that array. There are some additional features with the traditional character array. The null terminated strings are basically a sequence of characters, and the last element is one null character (denoted by '\0').

IS null terminator same as null?

The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Character Set (or Unicode), and EBCDIC.


1 Answers

Unfortunately, this is UB, if I interpret the wording correct (in any case, it's not allowed):

§21.4.5 [string.access] p2

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

(Editorial error that it says T not charT.)

.data() and .c_str() basically point back to operator[] (§21.4.7.1 [string.accessors] p1):

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

like image 189
Xeo Avatar answered Sep 23 '22 02:09

Xeo