Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default advice for using C-style string literals vs. constructing unnamed std::string objects?

So C++ 14 introduced a number of user-defined literals to use, one of which is the "s" literal suffix, for creating std::string objects. According to the documentation, its behavior is exactly the same as constructing an std::string object, like so:

auto str = "Hello World!"s; // RHS is equivalent to: std::string{ "Hello World!" }

Of course constructing an unnamed std::string object could be done prior to C++ 14, but because the C++ 14 way is so much simpler, I think way more people will actually consider constructing std::string objects on the spot than before, that's why I thought it makes sense to ask this.

So my question is simple: In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?


Example 1:

Consider the following:

void foo(std::string arg);

foo("bar");  // option 1
foo("bar"s); // option 2

If I'm correct, the first method will call the appropriate constructor overload of std::string to create an object inside foo's scope, and the second method will construct an unnamed string object first, and then move-construct foo's argument from that. Although I'm sure that compilers are very good at optimizing stuff like this, but still, the second version seems like it involves an extra move, as opposed to the first alternative (not like a move is expensive of course). But again, after compiling this with a reasonable compiler, the end results are most likely to be highly optimized, and free of redundand moves/copies anyway.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.


Example 2:

Consider the following:

std::cout << "Hello World!" << std::endl;  // option 1
std::cout << "Hello World!"s << std::endl; // option 2

In this case, the std::string object is probably passed to cout's operator via rvalue reference, and the first option passes a pointer probably, so both are very cheap operations, but the second one has the extra cost of constructing an object first. It's probably a safer way to go though (?).


In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well. This is more of an issue in the second example though, as in the first example, an std::string object will be constructed in both cases anyway. In practice, getting an exception from constructing a string object is very unlikely, but still could be a valid argument in certain cases.

If you can think of more examples to consider, please include them in your answer. I'm interested in a general advice regarding the usage of unnamed std::string objects, not just these two particular cases. I only included these to point out some of my thoughts regarding this topic.

Also, if I got something wrong, feel free to correct me as I'm not by any means a C++ expert. The behaviors I described are only my guesses on how things work, and I didn't base them on actual research or experimenting really.

like image 961
notadam Avatar asked Aug 20 '15 10:08

notadam


People also ask

Which is better string literal or string object?

In general, we should use the String literal notation when possible. It is easier to read and it gives the compiler a chance to optimize our code.

What is a difference between the std::string and C style strings?

std::string is compatible with STL algorithms and other containers. C strings are not char * or const char * ; they are just null-terminated character arrays. Even string literals are just character arrays.

What is std::string in C++?

C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.

Is std::string literal type?

Thus, std::string is a literal type.


3 Answers

In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

A std::string- literal is a good idea when you specifically want a variable of type std::string, whether for

  • modifying the value later (auto s = "123"s; s += '\n';)

  • the richer, intuitive and less error-prone interface (value semantics, iterators, find, size etc)

    • value semantics means ==, < copying etc. work on the values, unlike the pointer/by-reference semantics after C-string literals decay to const char*s
  • calling some_templated_function("123"s) would concisely ensure a <std::string> instantiation, with the argument being able to be handled using value semantics internally

    • if you know other code's instantiating the template for std::string anyway, and it's of significant complexity relative to your resource constraints, you might want to pass a std::string too to avoid unnecessarily instantiation for const char* too, but it's rare to need to care
  • values containing embedded NULs

A C-style string literal might be preferred where:

  • pointer-style semantics are wanted (or at least not a problem)

  • the value's only going to be passed to functions expecting const char* anyway, or std::string temporaries will get constructed anyway and you don't care that you're giving your compiler optimiser one extra hurdle to leap to achieve compile or load time construction if there's potential to reuse the same std::string instance (e.g. when passing to functions by const-reference) - again it's rare to need to care.

  • (another rare and nasty hack) you're somehow leveraging your compiler's string pooling behaviour, e.g. if it guarantees that for any given translation unit the const char* to string literals will only (but of course always) differ if the text differs

    • you can't really get the same from std::string .data()/.c_str(), as the same address may be associated with different text (and different std::string instances) during the program execution, and std::string buffers at distinct addresses may contain the same text
  • you benefit from having the pointer remain valid after a std::string would leave scope and be destroyed (e.g. given enum My_Enum { Zero, One }; - const char* str(My_Enum e) { return e == Zero ? "0" : "1"; } is safe, but const char* str(My_Enum e) { return e == Zero ? "0"s.c_str() : "1"s.c_str(); } isn't and std::string str(My_Enum e) { return e == Zero ? "0"s : "1"s; } smacks of premature pessimism in always using dynamic allocation (sans SSO, or for longer text))

  • you're leveraging compile-time concatenation of adjacent C-string literals (e.g. "abc" "xyz" becomes one contiguous const char[] literal "abcxyz") - this is particularly useful inside macro substitutions

  • you're memory constrained and/or don't want to risk an exception or crash during dynamic memory allocation

Discussion

[basic.string.literals] 21.7 lists:

string operator "" s(const char* str, size_t len);

Returns: string{str,len}

Basically, using ""s is calling a function that returns a std::string by value - crucially, you can bind a const reference, or rvalue reference, but not an lvalue reference.

When used to call void foo(std::string arg);, arg will be indeed be move constructed.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.

Doesn't matter much which you choose. Maintenance wise - if foo(const std::string&) is ever changed to foo(const char*), only foo("xyz"); invocations will seamlessly continue working, but there are very few vaguely plausible reasons it might be (so C code could call it too? - but still it'd be a bit mad not to continue to provide a foo(const std::string&) overload for existing client code; so it could be implemented in C? - perhaps; removing dependency on the <string> header? - irrelevant with modern computing resources).

std::cout << "Hello World!" << std::endl; // option 1

std::cout << "Hello World!"s << std::endl; // option 2

The former will call operator<<(std::ostream&, const char*), directly accessing the constant string literal data, with the only disadvantage being that the streaming may have to scan for the terminating NUL. "option 2" would match a const-reference overload and implies construction of a temporary, though compilers might be able to optimise it so they're not doing that unnecessarily often, or even effectively create the string object at compile time (which might only be practical for strings short enough to use an in-object Short String Optimisation (SSO) approach). If they're not doing such optimisations already, the potential benefit and hence pressure/desire to do so is likely to increase.

like image 154
Tony Delroy Avatar answered Oct 02 '22 03:10

Tony Delroy


First I believe the answer is opinion based!

For your example 1 you already mentioned all important arguments to use the new s literal. And yes, I expect that the result is the same so I see no need to say that I want a std::string in the definition.

One argument can be, that a constructor is defined explicit and a automatic type conversion will not happen. On this condition a s literal is helpful.

But is is a matter of taste I think!

For your example 2 I tend to use the "old" c-string version because generating a std::string object has overhead. Giving a pointer to the string for cout is well defined and I see no use case where I can have some benefit.

So my personal advice is actually ( every day new information is available :-) ) to use c-string if this exactly fit my needs. This means: The string is constant and will never be copied or modified and only used "as is". So a std::string will have simply no benefit.

And using 's'-literal comes in use where I have the need to define it is a std::string.

In a short: I do not use a std::string if I have no need for the additional features which std::string offers over an old c-string. For me the point is not using the s-literal but using std::string vs. c-strings in general.

Only as a remark: I have to program a lot on very small embedded devices, especially also on 8bit AVRs. Using std::string results in a lot overhead. If I have to use a dynamic container because I need the features of this container, it is very good to have one which is very well implemented and tested. But if I have no need for it it is simply to expensive to use it.

On a big target like a x86 box, it seems to be negligible to std::string instead of c-string. But having a small device in mind gives you a an idea what is really happening also on big machines.

Only my two cents!

like image 29
Klaus Avatar answered Oct 02 '22 04:10

Klaus


In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

What is or not a good idea tends to vary with the situation.

My choice is to use raw literals whenever they are enough (whenever I don't need anything else than a literal). If I need to access anything else than a pointer to the first element for the string, (the string length, it's back, iterators or anything else) then I use a std::string literal.

In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well.

Uhh ... while the code could indeed throw, this is irrelevant unless in very special circumstances (for example, embedded code running at - or close to - the memory limits of the hardware, or high-availability application/environment).

In practice, I have never had an out of memory condition, from writing auto a = "abdce"s; or other similar code.

In conclusion, don't bother with the exception safety of out-of-memory situations coming from instantiating a std::string literal. If you encounter an out of memory situation doing this, change the code when you find the error.

like image 42
utnapistim Avatar answered Oct 02 '22 04:10

utnapistim