Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does calling std::string.c_str() on a function that returns a string not work?

I have the following code:

std::string getString() {
    std::string str("hello");
    return str;
}

int main() {
    const char* cStr = getString().c_str();
    std::cout << cStr << std::endl; // this prints garbage
}

What I thought would happen is that getString() would return a copy of str (getString() returns by value); thus, the copy of str would stay "alive" in main() until main() returns. This would make cStr point to a valid memory location: the underlying char[] or char* (or whatever) of the copy of str returned by getString() which, remains in main().

However, this is obviously not the case, as the program outputs garbage. So, the question is, when is str destroyed, and why?

like image 350
hacksoi Avatar asked Mar 14 '16 06:03

hacksoi


People also ask

How does c_str () work?

The c_str() method converts a string to an array of characters with a null character at the end. The function takes in no parameters and returns a pointer to this character array (also called a c-string).

What is the purpose of the c_str () member function of std::string?

The basic_string::c_str() is a builtin function in C++ which returns a pointer to an array that contains a null-terminated sequence of characters representing the current value of the basic_string object.

What does string c_str return?

The function c_str() returns a const pointer to a regular C string, identical to the current string. The returned string is null-terminated. Note that since the returned pointer is of type (C/C++ Keywords) const, the character data that c_str() returns cannot be modified.

Is string c_str null-terminated?

c_str returns a "C string". And C strings are always terminated by a null character. This is C standard. Null terminating strings.


3 Answers

Problem here is that you are returning a temporary variable and over that temporary variable you are doing c_str function.

"c_str() function Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object( [http://www.cplusplus.com/reference/string/string/c_str/][1]).

In this case your pointer is pointing to memory location which is now not present.

std::string getString() {
        std::string str("hello");
        return str; // Will create Temporary object as it's return by value}

    int main() {
         const char* cStr = getString().c_str(); // Temporary object is destroyed
        std::cout << cStr << std::endl; // this prints garbage }

Solution is to copy your temporary object to memory location properly(by creating local copy) and then use c_str over that object.

like image 34
user258367 Avatar answered Oct 06 '22 10:10

user258367


getString() would return a copy of str (getString() returns by value);

It's right.

thus, the copy of str would stay "alive" in main() until main() returns.

No, the returned copy is a temporary std::string, which will be destroyed at the end of the statement in which it was created, i.e. before std::cout << cStr << std::endl;. Then cStr becomes dangled, dereference on it leads to UB, anything is possible.

You can copy the returned temporary to a named variable, or bind it to a const lvalue-reference or rvalue-reference (the lifetime of the temporary will be extended until the reference goes out of scope). Such as:

std::string s1 = getString();    // s1 will be copy initialized from the temporary
const char* cStr1 = s1.c_str();
std::cout << cStr1 << std::endl; // safe

const std::string& s2 = getString(); // lifetime of temporary will be extended when bound to a const lvalue-reference
const char* cStr2 = s2.c_str();
std::cout << cStr2 << std::endl; // safe

std::string&& s3 = getString();  // similar with above
const char* cStr3 = s3.c_str();
std::cout << cStr3 << std::endl; // safe

Or use the pointer before the temporary gets destroyed. e.g.

std::cout << getString().c_str() << std::endl;  // temporary gets destroyed after the full expression

Here is an explanation from [The.C++.Programming.Language.Special.Edition] 10.4.10 Temporary Objects [class.temp]]:

Unless bound to a reference or used to initialize a named object, a temporary object is destroyed at the end of the full expression in which it was created. A full expression is an expression that is not a subexpression of some other expression.

The standard string class has a member function c_str() that returns a C-style, zero-terminated array of characters (§3.5.1, §20.4.1). Also, the operator + is defined to mean string concatenation. These are very useful facilities for strings . However, in combination they can cause obscure problems. For example:

void f(string& s1, string& s2, string& s3)
{

    const char* cs = (s1 + s2).c_str();
    cout << cs ;
    if (strlen(cs=(s2+s3).c_str())<8 && cs[0]==´a´) {
        // cs used here
    }

}

Probably, your first reaction is "but don’t do that," and I agree. However, such code does get written, so it is worth knowing how it is interpreted.

A temporary object of class string is created to hold s1 + s2 . Next, a pointer to a C-style string is extracted from that object. Then – at the end of the expression – the temporary object is deleted. Now, where was the C-style string allocated? Probably as part of the temporary object holding s1 + s2 , and that storage is not guaranteed to exist after that temporary is destroyed. Consequently, cs points to deallocated storage. The output operation cout << cs might work as expected, but that would be sheer luck. A compiler can detect and warn against many variants of this problem.

like image 52
songyuanyao Avatar answered Oct 06 '22 11:10

songyuanyao


As mentioned by others you are using a pointer to temporary after it has already been deleted - this is a classic example of heap after free use.

What I can add to others' answers is that you can easily detect such usage with gcc's or clang's address sanitizers.

Example:

#include <string>
#include <iostream>

std::string get()
{
  return "hello";
}

int main()
{
  const char* c = get().c_str();
  std::cout << c << std::endl;
}

sanitizer output:

=================================================================
==2951==ERROR: AddressSanitizer: heap-use-after-free on address 0x60300000eff8 at pc 0x7f78e27869bb bp 0x7fffc483e670 sp 0x7fffc483de20
READ of size 6 at 0x60300000eff8 thread T0
    #0 0x7f78e27869ba in strlen (/usr/lib64/libasan.so.2+0x6d9ba)
    #1 0x39b4892ba0 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (/usr/lib64/libstdc++.so.6+0x39b4892ba0)
    #2 0x400dd8 in main /tmp/tmep_string/main.cpp:12
    #3 0x39aa41ed5c in __libc_start_main (/lib64/libc.so.6+0x39aa41ed5c)
    #4 0x400c48  (/tmp/tmep_string/a.out+0x400c48)

0x60300000eff8 is located 24 bytes inside of 30-byte region [0x60300000efe0,0x60300000effe)
freed by thread T0 here:
    #0 0x7f78e27ae6ea in operator delete(void*) (/usr/lib64/libasan.so.2+0x956ea)
    #1 0x39b489d4c8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (/usr/lib64/libstdc++.so.6+0x39b489d4c8)
    #2 0x39aa41ed5c in __libc_start_main (/lib64/libc.so.6+0x39aa41ed5c)

previously allocated by thread T0 here:
    #0 0x7f78e27ae1aa in operator new(unsigned long) (/usr/lib64/libasan.so.2+0x951aa)
    #1 0x39b489c3c8 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (/usr/lib64/libstdc++.so.6+0x39b489c3c8)
    #2 0x400c1f  (/tmp/tmep_string/a.out+0x400c1f)

SUMMARY: AddressSanitizer: heap-use-after-free ??:0 strlen
Shadow bytes around the buggy address:
  0x0c067fff9da0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9db0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9dc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9dd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9de0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c067fff9df0: fa fa fa fa fa fa fa fa fa fa fa fa fd fd fd[fd]
  0x0c067fff9e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
==2951==ABORTING
like image 4
Patryk Avatar answered Oct 06 '22 10:10

Patryk