Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Casting c_str() only works for short strings

I'm using a C library in C++ and wrote a wrapper. At one point I need to convert an std::string to a c-style string. There is a class with a function, which returns a string. Casting the returned string works if the string is short, otherwise not. Here is a simple and reduced example illustrating the issue:

#include <iostream>
#include <string>

class StringBox {
public:
  std::string getString() const { return text_; }

  StringBox(std::string text) : text_(text){};

private:
  std::string text_;
};

int main(int argc, char **argv) {
  const unsigned char *castString = NULL;
  std::string someString = "I am a loooooooooooooooooong string";  // Won't work
  // std::string someString = "hello";  // This one works

  StringBox box(someString);

  castString = (const unsigned char *)box.getString().c_str();
  std::cout << "castString: " << castString << std::endl;

  return 0;
}

Executing the file above prints this to the console:

castString:

whereas if I swap the commenting on someString, it correctly prints

castString: hello

How is this possible?

like image 504
Potaito Avatar asked Mar 14 '16 17:03

Potaito


3 Answers

You are invoking c_str on a temporary string object retuned by the getString() member function. The pointer returned by c_str() is only valid as long as the original string object exists, so at the end of the line where you assign castString it ends up being a dangling pointer. Officially, this leads to undefined behavior.

So why does this work for short strings? I suspect that you're seeing the effects of the Short String Optimization, an optimization where for strings less than a certain length the character data is stored inside the bytes of the string object itself rather than in the heap. It's possible that the temporary string that was returned was stored on the stack, so when it was cleaned up no deallocations occurred and the pointer to the expired string object still holds your old string bytes. This seems consistent with what you're seeing, but it still doesn't mean what you're doing is a good idea. :-)

like image 117
templatetypedef Avatar answered Oct 10 '22 21:10

templatetypedef


box.getString() is an anonymous temporary. c_str() is only valid for the length of the variable.

So in your case, c_str() is invalidated by the time you get to the std::cout. The behaviour of reading the pointer contents is undefined.

(Interestingly the behaviour of your short string is possibly different due to std::string storing short strings in a different way.)

like image 41
Bathsheba Avatar answered Oct 10 '22 22:10

Bathsheba


As you return by value

box.getString() is a temporary and so

box.getString().c_str() is valid only during the expression, then it is a dangling pointer.

You may fix that with

const std::string& getString() const { return text_; }
like image 26
Jarod42 Avatar answered Oct 10 '22 22:10

Jarod42