Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using .data() member of moved std::string doesn't work for small strings?

Why does the following program print garbage instead of hello? Interestingly, if I replace hello with hello how are you, then it prints hello how are you.

#include <string>
#include <iostream>

class Buffer
{
public:
    Buffer(std::string s):
        _raw(const_cast<char*>(s.data())),
        _buffer(std::move(s))
    {    
    }

    void Print()
    {
        std::cout << _raw;
    }

private:
    char* _raw;
    std::string _buffer;
};    

int main()
{   
    Buffer b("hello");
    b.Print();
}
like image 944
tcb Avatar asked Nov 30 '22 18:11

tcb


1 Answers

From your question, you imply a class invariant of Buffer. A class invariant is a relationship between the data members of a class that is assumed to always be true. In your case the implied invariant is:

assert(_raw == _buffer.data());

Joachim Pileborg correctly describes why this invariant is not maintained in your Buffer(std::string s) constructor (upvoted).

It turns out that maintaining this invariant is surprisingly tricky. Therefore my very first recommendation is that you redesign Buffer such that this invariant is no longer needed. The simplest way to do that is to compute _raw on the fly whenever you need it, instead of storing it. For example:

void Print()
{
    std::cout << _buffer.data();
}

That being said, if you really need to store _raw and maintain this invariant:

assert(_raw == _buffer.data());

The following is the path you will need to go down...

Buffer(std::string s)
    : _buffer(std::move(s))
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Reorder your initialization such that you first construct _buffer by moving into it, and then point into _buffer. Do not point into the local s which will be destructed as soon as this constructor completes.

A very subtle point here is that despite the fact that I have reordered the initialization list in the constructor, I have not yet actually reordered the actual construction. To do that, I must reorder the list of data member declarations:

private:
    std::string _buffer;
    char* _raw;

It is this order, and not the order of the initialization list in the constructor that determines which member is constructed first. Some compilers with some warnings enabled will warn you if you attempt to order your constructor initialization list differently than the order the members will actually be constructed.

Now your program will run as expected, for any string input. However we are just getting started. Buffer is still buggy as your invariant is still not maintained. The best way to demonstrate this is to assert your invariant in ~Buffer():

~Buffer()
{
    assert(_raw == _buffer.data());
}

As it stands (and without the user-declared ~Buffer() I just recommended), the compiler helpfully supplies you with four more signatures:

Buffer(const Buffer&) = default;
Buffer& operator=(const Buffer&) = default;
Buffer(Buffer&&) = default;
Buffer& operator=(Buffer&&) = default;

And the compiler breaks your invariant for every one of these signatures. If you add ~Buffer() as I suggested, the compiler will not supply the move members, but it will still supply the copy members, and still get them wrong (though that behavior has been deprecated). And even if the destructor did inhibit the copy members (as it might in a future standard), the code is still dangerous as under maintenance someone might "optimize" your code like so:

#ifndef NDEBUG
    ~Buffer()
    {
        assert(_raw == _buffer.data());
    }
#endif

in which case the compiler would supply the buggy copy and move members in release mode.

To fix the code you must re-establish your class invariant every time _buffer is constructed, or outstanding pointers into it might be invalidated. For example:

Buffer(const Buffer& b)
    : _buffer(b._buffer)
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Buffer& operator=(const Buffer& b)
{
    if (this != &b)
    {
        _buffer = b._buffer;
        _raw = const_cast<char*>(_buffer.data());
    }
    return *this;
}

If you add any members in the future which have the potential for invalidating _buffer.data(), you must remember to reset _raw. For example a set_string(std::string) member function would need this treatment.

Though you did not directly ask, your question alludes to a very important point in class design: Be aware of your class invariants, and what it takes to maintain them. Corollary: Minimize the number of invariants you have to manually maintain. And test that your invariants actually are maintained.

like image 141
Howard Hinnant Avatar answered Apr 19 '23 22:04

Howard Hinnant