Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string not optimized enough for string literals

Tags:

c++

string

stl

In a C++ project of mine I'm one step before replacing all char* with std::string but I find one certain occasion where std::string fails miserably.

Imagine I have these 2 functions:

void foo1(const std::string& s)
{
    ...
}

void foo2(const char* s)
{
    ...
}

If I write something like this:

const char* SL = "Hello to all!";

foo1(SL); // calls malloc, memcpy, free
foo2(SL);

in foo1 the SL will implicitly converted into std::string. This means that the std::string constructor will allocate memory and it will copy the string literal to that buffer. In foo2 though nothing of all these will happen.

In most implementations std::string is supposed to be super optimized (Copy On Write for instance) but when I construct it with a const char* it is not. And my question is this: Why this happens? Am I missing something? Is my standard library not optimized enough or for some reason (that I'm not aware of) this is totally unsafe?

like image 819
Pan. Christopoulos Charitos Avatar asked Dec 05 '11 13:12

Pan. Christopoulos Charitos


People also ask

Why can't you modify a string literal C?

Modifying a string literal frequently results in an access violation because string literals are typically stored in read-only memory. (See undefined behavior 33.) Avoid assigning a string literal to a pointer to non- const or casting a string literal to a pointer to non- const .

Is it possible to modify a string literal?

The only difference is that you cannot modify string literals, whereas you can modify arrays.

What are examples of string literals?

A string literal is a sequence of zero or more characters enclosed within single quotation marks. The following are examples of string literals: 'Hello, world!' 'He said, "Take it or leave it."'

How is a string literal stored in the memory?

The characters of a literal string are stored in order at contiguous memory locations. An escape sequence (such as \\ or \") within a string literal counts as a single character. A null character (represented by the \0 escape sequence) is automatically appended to, and marks the end of, each string literal.


2 Answers

Actually, your worries would go away(*) if you changed the literal:

std::string const SL = "Hello to all!";

I added the const for you.

Now, calling foo1 will not involve any copying (at all), and calling foo2 can be achieved at little cost:

foo1(SL);         // by const-reference, exact same cost than a pointer
foo2(SL.c_str()); // simple pointer

If you want to move to std::string, don't only switch the functions interfaces, switch the variables (and constants) too.

(*) The original answer assumed that SL was a global constant, if it is a variable local to a function, then it could be made static if one truly wishes to avoid building it at each call.

like image 115
Matthieu M. Avatar answered Oct 21 '22 16:10

Matthieu M.


The problem is that there is no way for the std::string class to recognize whether the const char* pointer is a global character literal or not:

const char *a = "Hello World";
const char *b = new char[20];

The char* pointer might get invalid at any time (for example when it's a local variable and the function/scope ends), thus std::string must become an exclusive owner of the string. This can only be achieved by copying.

The following example demonstrates why it is necessary:

std::string getHelloWorld()  {
  char *hello = new char[64];
  strcpy(hello, "Hello World");
  std::string result = (const char *)hello;  // If std::string didn't make a copy, the result could be a garbage
  delete[] hello;
  return result;
}
like image 28
Karel Petranek Avatar answered Oct 21 '22 18:10

Karel Petranek