Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there downsides to using std::string as a buffer?

I have recently seen a colleague of mine using std::string as a buffer:

std::string receive_data(const Receiver& receiver) {
  std::string buff;
  int size = receiver.size();
  if (size > 0) {
    buff.resize(size);
    const char* dst_ptr = buff.data();
    const char* src_ptr = receiver.data();
    memcpy((char*) dst_ptr, src_ptr, size);
  }
  return buff;
}

I guess this guy wants to take advantage of auto destruction of the returned string so he needs not worry about freeing of the allocated buffer.

This looks a bit strange to me since according to cplusplus.com the data() method returns a const char* pointing to a buffer internally managed by the string:

const char* data() const noexcept;

Memcpy-ing to a const char pointer? AFAIK this does no harm as long as we know what we do, but have I missed something? Is this dangerous?

like image 505
duong_dajgja Avatar asked Jun 03 '19 07:06

duong_dajgja


People also ask

Should I use std::string?

h functions when you are declaring string with std::string keyword because std::string strings are of basic_string class type and cstring strings are of const char* type. Pros: When dealing exclusively in C++ std:string is the best way to go because of better searching, replacement, and manipulation functions.

Should I use std::string or * char?

Use std::string when you need to store a value. Use const char * when you want maximum flexibility, as almost everything can be easily converted to or from one.

Is std::string the same as string?

There is no functionality difference between string and std::string because they're the same type.

Does std::string allocate?

While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation.


4 Answers

Don't use std::string as a buffer.

It is bad practice to use std::string as a buffer, for several reasons (listed in no particular order):

  • std::string was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).
  • As a concrete example: Before C++17, you can't even write through the pointer you get with data() - it's const Tchar *; so your code would cause undefined behavior. (But &(str[0]), &(str.front()), or &(*(str.begin())) would work.)
  • Using std::strings for buffers is confusing to readers of your function's definition, who assume you would be using std::string for, well, strings. In other words, doing so breaks the Principle of Least Astonishment.
  • Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
  • std::unique_ptr would be fine for your case, or even std::vector. In C++17, you can use std::byte for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost's small_vector (thank you, @gast128, for mentioning it).
  • (Minor point:) libstdc++ had to change its ABI for std::string to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.

Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()ing. But that in itself is not really a reason to avoid std::string, since you can avoid the double allocation using the construction in @Jarod42's answer.

like image 58
einpoklum Avatar answered Oct 17 '22 17:10

einpoklum


You can completely avoid a manual memcpy by calling the appropriate constructor:

std::string receive_data(const Receiver& receiver) {
    return {receiver.data(), receiver.size()};
}

That even handles \0 in a string.

BTW, unless content is actually text, I would prefer std::vector<std::byte> (or equivalent).

like image 37
Jarod42 Avatar answered Oct 17 '22 17:10

Jarod42


Memcpy-ing to a const char pointer? AFAIK this does no harm as long as we know what we do, but is this good behavior and why?

The current code may have undefined behavior, depending on the C++ version. To avoid undefined behavior in C++14 and below take the address of the first element. It yields a non-const pointer:

buff.resize(size);
memcpy(&buff[0], &receiver[0], size);

I have recently seen a colleague of mine using std::string as a buffer...

That was somewhat common in older code, especially circa C++03. There are several benefits and downsides to using a string like that. Depending on what you are doing with the code, std::vector can be a bit anemic, and you sometimes used a string instead and accepted the extra overhead of char_traits.

For example, std::string is usually a faster container than std::vector on append, and you can't return std::vector from a function. (Or you could not do so in practice in C++98 because C++98 required the vector to be constructed in the function and copied out). Additionally, std::string allowed you to search with a richer assortment of member functions, like find_first_of and find_first_not_of. That was convenient when searching though arrays of bytes.

I think what you really want/need is SGI's Rope class, but it never made it into the STL. It looks like GCC's libstdc++ may provide it.


There a lengthy discussion about this being legal in C++14 and below:

const char* dst_ptr = buff.data();
const char* src_ptr = receiver.data();
memcpy((char*) dst_ptr, src_ptr, size);

I know for certain it is not safe in GCC. I once did something like this in some self tests and it resulted in a segfault:

std::string buff("A");
...

char* ptr = (char*)buff.data();
size_t len = buff.size();

ptr[0] ^= 1;  // tamper with byte
bool tampered = HMAC(key, ptr, len, mac);

GCC put the single byte 'A' in register AL. The high 3-bytes were garbage, so the 32-bit register was 0xXXXXXX41. When I dereferenced at ptr[0], GCC dereferenced a garbage address 0xXXXXXX41.

The two take-aways for me were, don't write half-ass self tests, and don't try to make data() a non-const pointer.

like image 10
jww Avatar answered Oct 17 '22 18:10

jww


From C++17, data can return a non const char *.

Draft n4659 declares at [string.accessors]:

const charT* c_str() const noexcept;
const charT* data() const noexcept;
....
charT* data() noexcept;
like image 7
Serge Ballesta Avatar answered Oct 17 '22 19:10

Serge Ballesta