Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does g++ meets std::string C++11 requirements

Consider the following example:

int main()
{
    string x = "hello";
    //copy constructor has been called here.
    string y(x);
    //c_str return const char*, but this usage is quite popular.
    char* temp = (char*)y.c_str();

    temp[0] = 'p';

    cout << "x = " << x << endl;
    cout << "y = " << y << endl;

    cin >> x;
    return 0;
}

Run it on visual studio compiler and on g++. When I did so, I got two different results.
in g++:

x = pello  
y = pello

In visual studio 2010:

x = hello  
y = pello

The reason for the diff is most likely that g++ std::string implementation uses COW (copy on write) techniques and visual studio does not.

Now the C++ standard (page 616 table 64) states with regards to string copy constructor

basic_string(const basic_string& str):

effects:
data() should "points at the first element of an allocated copy of the array whose first element is pointed at by str.data()"

Meaning COW is not allowed (at least to my understanding).
How can that be?
Does g++ meets std::string C++11 requirements?

Before C++11 this did not pose a big problem since c_str didn't return a pointer to the actual data the string object holds, so changing it didn't matter. But after the change this combination of COW + returning the actual pointer can and breaks old applications (applications that deserve it for bad coding but nevertheless).

Do you agree with me? If yes, can something be done? Does anyone have an idea about how to go at it in a very big old code environments (a clockwork rule to catch this would be nice).

Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write).
Another example without casting the constness away:

int main()
{
    string x = "hello";
    //copy constructor has been called here.
    string y(x);

    //y[0] = 'p';

    //c_str return const char*, but this usage is quite popular.
    const char* temp = y.c_str();

    y[0] = 'p';

    //Now we expect "pello" because the standart says the pointer points to the actual data
    //but we will get "hello"
    cout << "temp = " << temp << endl; 



    return 0;
}
like image 765
buc030 Avatar asked Jan 29 '14 12:01

buc030


People also ask

Can you use std::string in C?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.

Is std::string the same as string?

There is no functionality difference between string and std::string because they're the same type.

Does STD variant require RTTI?

Since this function is specific to a given type, you don't need RTTI to perform the operations required by std::any .

What does std::string () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.


4 Answers

You're right that COW is disallowed. But GCC hasn't updated its implementation yet, allegedly due to ABI constraints. A new implementation, designed eventually to supplant the std::string implementation, can be found as ext/vstring.h.

A bug in libstdc++'s std::string, albeit not this one, is not going to make it into GCC 4.9; Jonathan indicates on the bug that it has only been fixed for vstring so far. My guess would be, then, that the COW issue would be resolved around the same time.

Despite all this, casting away constness then mutating is pretty much always a bad idea: though you're correct that this should in practice be safe with a fully C++11-compliant string implementation, you're making assumptions and this very problem proves that you cannot always rely on those assumptions to hold. So, while your code example may be "popular", it's popular in poor code, and shouldn't be written even now. And, of course, writing that in C++03 is flat-out incompetence!

like image 172
Lightness Races in Orbit Avatar answered Oct 30 '22 03:10

Lightness Races in Orbit


libstd++'s implementation is non-conformant to C++11, but that doesn't mean your code is correctly guaranteeing the results you expect.

Doing anything to modify the values stored in the character array returned by c_str() results in undefined behavior. The standard explicitly says this:

21.4.7.1 basic_string accessors

const charT* c_str() const noexcept;
const charT* data() const noexcept;
1Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2Complexity: constant time.
3Requires: The program shall not alter any of the values stored in the character array.

Although above I quote C++11 this was also true of C++03.


Does anyone have an idea about how to go at it in a very big old code environments (a clockwork rule to catch this would be nice).

Hopefully you have a decent test suite. Making significant changes to large, legacy code-bases is not really practical otherwise. The easier and faster it is to run the test suite the easier and faster it will be to fix the code.

On a very large codebase auditing all uses of c_str() may be very expensive. However taking a sample and checking for what sorts of uses are made of it and what specific corrections could be applied can help you gauge the scale of the problem. In my experience you can expect a wide variety of weird things, but some will be more common.

Valgrind, debug implementations of std::string, and other tools can help identify some instances which are likely to cause real bugs. Fixing those first is the high priority. The fixes will likely involve updating APIs to be const-correct or to have well defined lifetime requirements, and switching uses of c_str() for something that produces C strings with appropriate lifetimes. Your survey of the code should have informed you as to the general variety of lifetime requirements and c-string creating utilities that will be necessary.

Other uses of c_str() can be modified incrementally over time as a lower priority, side activity.

Tools like some of those built on top of clang for refactoring or semantic search are another option for identifying problems and making large-scale changes, however it's often a big task just to get legacy code into a legal enough shape for clang tools to process it. (Here's a talk about some work Google did on this. There are also more recent talks they've done on commodity versions of this technology which Google has made available.)


I often have a hard time convincing people that 'undefined behavior' is actually a problem even in instances when no ill effects are actually observed. As you write new code remember from this experience that the lives of future maintainers will be made much easier if you conform to the C++ spec. Even if some particular instance of 'bad' code doesn't cause you problems now, that is likely to change over time as compilers and library implementations change. And even when the spec changes, the committee is careful to consider the effects on conformant legacy code. If code isn't conformant then it really doesn't get any consideration and you end up with problems like this.

like image 43
bames53 Avatar answered Oct 30 '22 03:10

bames53


Does g++ meets std::string C++11 requirements?

No.

Before C++11 this did not pose a big problem since c_str didn't return a pointer to the actual data the string object holds, so changing it didn't matter.

This is incorrect, c_str was always allowed to return the actual data and that's exactly what it did in all popular C++03 implementations.

But after the change this combination of COW + returning the actual pointer can and breaks old applications (applications that deserve it for bad coding but nevertheless).

After what change? G++ did not change its std::string so if your old program is broken using G++ then it was always broken.

Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write).

Your second example doesn't demonstrate any invalidation, because in a COW implementation temp is still a valid pointer while x exists. But it's possible to modify the example to invalidate temp and that's not allowed in C++11, [string.require]/6 says that in C++11 y[0] is not allowed to invalidate the pointer returned by c_str().

like image 27
Jonathan Wakely Avatar answered Oct 30 '22 01:10

Jonathan Wakely


The other answers were correct at the time, but as of nowadays, accordingly to the GCC 5.x Change Log, libstdc++ as shipped by gcc 5 is now fully C++11 conformant.

like image 23
Alexandre Pereira Nunes Avatar answered Oct 30 '22 02:10

Alexandre Pereira Nunes