Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ string constructor under the hood

Tags:

c++

I have a simple program:

#include <iostream>
#include <string>
#include <string.h>

using namespace std;

string read0() {
    int length = 4;

    char *cstr = new char[length];

    string str(cstr);

    delete[] cstr;

    return str;
}

string read1() {
    int length = 4;

    char cstr[length];

    memset(cstr, '-', 4);

    string str(cstr);

   return str;
}

string read2() {
    const char* cstr = "abcd";

    string str(cstr);

    return str;
}

In all 3 functions above, for constructing a string, they call basic_string( const CharT* s, const Allocator& alloc = Allocator(). When I use valgrind/massif to check the heap usage, function read0 only uses 4 bytes (from new), but read1 and read2 both use 29 bytes.

Here's some detail output of massif:

For read0:

16.67% (4B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

->16.67% (4B) 0x400A0B: read0() (main.cpp:10)

->16.67% (4B) 0x400BC8: main (main.cpp:40)

For read1 and read2:

72.50% (29B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

->72.50% (29B) 0x4EE93B7: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)

->72.50% (29B) 0x4EEAD93: char* std::string::_S_construct(char const*, char const*, std::allocator const&, std::forward_iterator_tag) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)

->72.50% (29B) 0x4EEAE71: std::basic_string, std::allocator >::basic_string(char const*, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)

->72.50% (29B) 0x400B81: read2() (main.cpp:34)

->72.50% (29B) 0x400BC8: main (main.cpp:40)

What causes this difference?

like image 576
eltonsky Avatar asked Mar 15 '13 01:03

eltonsky


2 Answers

Someone please correct me if I'm wrong, but I think I know why this is.

In read0, you do:

char *cstr = new char[length];
string str(cstr);

You don't initialize cstr at all, so it could be undefined. The string constructor takes a null-terminated c-string and copies it up until the null terminator. I think it finds it at the very first element, so it's only copying the pointer that would point to such a string, which presumably takes up 4 bytes.

I think your read1 is similar. It ends up finding a null terminator at the some point past the string itself, since it's not null-terminated, which ends up being 29 bytes.

I don't know why read2 is doing the same thing though, to be honest. It could be I'm wrong about the reason for read1 above and that 29 bytes (minus the 4 characters/bytes in the c-string) is the minimal operating cost of a string in your architecture and compiler's implementation of the STL.

In any case, to narrow down the various possibilities, I suggest you null-terminate the strings in read0 and read1 and try your experiment again by allocating one extra element and setting the last element to '\0' or by using the alternative string constructor that takes an additional second parameter which denotes how many characters to copy:

string(non_null_terminated_string, this_many_characters_to_copy)
like image 110
Jorge Israel Peña Avatar answered Oct 19 '22 19:10

Jorge Israel Peña


In read0 you are initializing str from cstr which is empty (zero length). In read1 and read2 you are initializing str from non-empty strings. The amount of heap actually allocated in the latter cases (29 bytes) is larger than strictly necessary to make heap management faster and simpler - for example it's not unusual to allocate heap in chunks of 32 bytes (or some other rounded number).

Also, your read1 is bugged: if you want cstr to contain "----" then you need cstr[5], not cstr[4]. Your array needs space for the string-terminating zero. And then you should probably use strcpy(cstr, "----") rather that memset(cstr, '-', 4) otherwise cstr will not have the (needed) zero terminator.

like image 27
jarmod Avatar answered Oct 19 '22 20:10

jarmod