So C++ 14 introduced a number of user-defined literals to use, one of which is the "s" literal suffix, for creating <code>std::string</code> objects. According to the documentation, its behavior is exactly the same as constructing an <code>std::string</code> object, like so: <pre class="prettyprint"><code>auto str = "Hello World!"s; // RHS is equivalent to: std::string{ "Hello World!" } </code></pre> Of course constructing an unnamed <code>std::string</code> object could be done prior to C++ 14, but because the C++ 14 way is so much simpler, I think way more people will actually consider constructing <code>std::string</code> objects on the spot than before, that's why I thought it makes sense to ask this. So my question is simple: In what cases it's a good (or bad) idea construct an unnamed <code>std::string</code> object, instead of simply using a C-style string literal? <hr> <h3>Example 1:</h3> Consider the following: <pre class="prettyprint"><code>void foo(std::string arg); foo("bar"); // option 1 foo("bar"s); // option 2 </code></pre> If I'm correct, the first method will call the appropriate constructor overload of <code>std::string</code> to create an object inside <code>foo</code>'s scope, and the second method will construct an unnamed string object first, and then move-construct <code>foo</code>'s argument from that. Although I'm sure that compilers are very good at optimizing stuff like this, but still, the second version seems like it involves an extra move, as opposed to the first alternative (not like a move is expensive of course). But again, after compiling this with a reasonable compiler, the end results are most likely to be highly optimized, and free of redundand moves/copies anyway. Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call <code>foo("bar"s)</code>, but I could be wrong. <hr> <h3>Example 2:</h3> Consider the following: <pre class="prettyprint"><code>std::cout << "Hello World!" << std::endl; // option 1 std::cout << "Hello World!"s << std::endl; // option 2 </code></pre> In this case, the <code>std::string</code> object is probably passed to <code>cout</code>'s operator via rvalue reference, and the first option passes a pointer probably, so both are very cheap operations, but the second one has the extra cost of constructing an object first. It's probably a safer way to go though (?). <hr> In all cases of course, constructing an <code>std::string</code> object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well. This is more of an issue in the second example though, as in the first example, an <code>std::string</code> object will be constructed in both cases anyway. In practice, getting an exception from constructing a string object is very unlikely, but still could be a valid argument in certain cases. If you can think of more examples to consider, please include them in your answer. I'm interested in a general advice regarding the usage of unnamed <code>std::string</code> objects, not just these two particular cases. I only included these to point out some of my thoughts regarding this topic. Also, if I got something wrong, feel free to correct me as I'm not by any means a C++ expert. The behaviors I described are only my guesses on how things work, and I didn't base them on actual research or experimenting really.

<blockquote> In what cases it's a good (or bad) idea construct an unnamed <code>std::string</code> object, instead of simply using a C-style string literal? </blockquote> A <code>std::string</code>- literal is a good idea when you specifically want a variable of type <code>std::string</code>, whether for <ul> <li> modifying the value later (<code>auto s = "123"s; s += '\n';</code>) </li> <li> the richer, intuitive and less error-prone interface (value semantics, iterators, <code>find</code>, <code>size</code> etc) <ul> <li> value semantics means <code>==</code>, <code><</code> copying etc. work on the values, unlike the pointer/by-reference semantics after C-string literals decay to <code>const char*</code>s</li> </ul> </li> <li> calling <code>some_templated_function("123"s)</code> would concisely ensure a <code><std::string></code> instantiation, with the argument being able to be handled using value semantics internally <ul> <li>if you know other code's instantiating the template for <code>std::string</code> anyway, and it's of significant complexity relative to your resource constraints, you might want to pass a <code>std::string</code> too to avoid unnecessarily instantiation for <code>const char*</code> too, but it's rare to need to care</li> </ul> </li> <li> values containing embedded <code>NUL</code>s </li> </ul> A C-style string literal might be preferred where: <ul> <li> pointer-style semantics are wanted (or at least not a problem) </li> <li> the value's only going to be passed to functions expecting <code>const char*</code> anyway, or <code>std::string</code> temporaries will get constructed anyway and you don't care that you're giving your compiler optimiser one extra hurdle to leap to achieve compile or load time construction if there's potential to reuse the same <code>std::string</code> instance (e.g. when passing to functions by <code>const</code>-reference) - again it's rare to need to care. </li> <li> (another rare and nasty hack) you're somehow leveraging your compiler's string pooling behaviour, e.g. if it guarantees that for any given translation unit the <code>const char*</code> to string literals will only (but of course always) differ if the text differs <ul> <li>you can't really get the same from <code>std::string</code> <code>.data()</code>/<code>.c_str()</code>, as the same address may be associated with different text (and different <code>std::string</code> instances) during the program execution, and <code>std::string</code> buffers at distinct addresses may contain the same text</li> </ul> </li> <li> you benefit from having the pointer remain valid after a <code>std::string</code> would leave scope and be destroyed (e.g. given <code>enum My_Enum { Zero, One };</code> - <code>const char* str(My_Enum e) { return e == Zero ? "0" : "1"; }</code> is safe, but <code>const char* str(My_Enum e) { return e == Zero ? "0"s.c_str() : "1"s.c_str(); }</code> isn't and <code>std::string str(My_Enum e) { return e == Zero ? "0"s : "1"s; }</code> smacks of premature pessimism in always using dynamic allocation (sans SSO, or for longer text)) </li> <li> you're leveraging compile-time concatenation of adjacent C-string literals (e.g. <code>"abc" "xyz"</code> becomes one contiguous <code>const char[]</code> literal <code>"abcxyz"</code>) - this is particularly useful inside macro substitutions </li> <li> you're memory constrained and/or don't want to risk an exception or crash during dynamic memory allocation </li> </ul> <h3>Discussion</h3> [basic.string.literals] 21.7 lists: <blockquote> <code>string operator "" s(const char* str, size_t len);</code> Returns: <code>string{str,len}</code> </blockquote> Basically, using <code>""s</code> is calling a function that returns a <code>std::string</code> by value - crucially, you can bind a <code>const</code> reference, or rvalue reference, but not an lvalue reference. When used to call <code>void foo(std::string arg);</code>, <code>arg</code> will be indeed be move constructed. <blockquote> Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong. </blockquote> Doesn't matter much which you choose. Maintenance wise - if <code>foo(const std::string&)</code> is ever changed to <code>foo(const char*)</code>, only <code>foo("xyz");</code> invocations will seamlessly continue working, but there are very few vaguely plausible reasons it might be (so C code could call it too? - but still it'd be a bit mad not to continue to provide a <code>foo(const std::string&)</code> overload for existing client code; so it could be implemented in C? - perhaps; removing dependency on the <code><string></code> header? - irrelevant with modern computing resources). <blockquote> std::cout << "Hello World!" << std::endl; // option 1 std::cout << "Hello World!"s << std::endl; // option 2 </blockquote> The former will call <code>operator<<(std::ostream&, const char*)</code>, directly accessing the constant string literal data, with the only disadvantage being that the streaming may have to scan for the terminating NUL. "option 2" would match a <code>const</code>-reference overload and implies construction of a temporary, though compilers might be able to optimise it so they're not doing that unnecessarily often, or even effectively create the string object at compile time (which might only be practical for strings short enough to use an in-object Short String Optimisation (SSO) approach). If they're not doing such optimisations already, the potential benefit and hence pressure/desire to do so is likely to increase.

Default advice for using C-style string literals vs. constructing unnamed std::string objects?

Tags:

c++

string

stdstring

string-literals

c-strings

So C++ 14 introduced a number of user-defined literals to use, one of which is the "s" literal suffix, for creating std::string objects. According to the documentation, its behavior is exactly the same as constructing an std::string object, like so:

auto str = "Hello World!"s; // RHS is equivalent to: std::string{ "Hello World!" }

Of course constructing an unnamed std::string object could be done prior to C++ 14, but because the C++ 14 way is so much simpler, I think way more people will actually consider constructing std::string objects on the spot than before, that's why I thought it makes sense to ask this.

So my question is simple: In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

Example 1:

Consider the following:

void foo(std::string arg);

foo("bar");  // option 1
foo("bar"s); // option 2

If I'm correct, the first method will call the appropriate constructor overload of std::string to create an object inside foo's scope, and the second method will construct an unnamed string object first, and then move-construct foo's argument from that. Although I'm sure that compilers are very good at optimizing stuff like this, but still, the second version seems like it involves an extra move, as opposed to the first alternative (not like a move is expensive of course). But again, after compiling this with a reasonable compiler, the end results are most likely to be highly optimized, and free of redundand moves/copies anyway.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.

Example 2:

Consider the following:

std::cout << "Hello World!" << std::endl;  // option 1
std::cout << "Hello World!"s << std::endl; // option 2

In this case, the std::string object is probably passed to cout's operator via rvalue reference, and the first option passes a pointer probably, so both are very cheap operations, but the second one has the extra cost of constructing an object first. It's probably a safer way to go though (?).

In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well. This is more of an issue in the second example though, as in the first example, an std::string object will be constructed in both cases anyway. In practice, getting an exception from constructing a string object is very unlikely, but still could be a valid argument in certain cases.

If you can think of more examples to consider, please include them in your answer. I'm interested in a general advice regarding the usage of unnamed std::string objects, not just these two particular cases. I only included these to point out some of my thoughts regarding this topic.

Also, if I got something wrong, feel free to correct me as I'm not by any means a C++ expert. The behaviors I described are only my guesses on how things work, and I didn't base them on actual research or experimenting really.

961

asked Aug 20 '15 10:08

notadam

3 Answers

In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

A std::string- literal is a good idea when you specifically want a variable of type std::string, whether for

modifying the value later (auto s = "123"s; s += '\n';)
the richer, intuitive and less error-prone interface (value semantics, iterators, find, size etc)
- value semantics means ==, < copying etc. work on the values, unlike the pointer/by-reference semantics after C-string literals decay to const char*s
calling some_templated_function("123"s) would concisely ensure a <std::string> instantiation, with the argument being able to be handled using value semantics internally
- if you know other code's instantiating the template for std::string anyway, and it's of significant complexity relative to your resource constraints, you might want to pass a std::string too to avoid unnecessarily instantiation for const char* too, but it's rare to need to care
values containing embedded NULs

A C-style string literal might be preferred where:

pointer-style semantics are wanted (or at least not a problem)
the value's only going to be passed to functions expecting const char* anyway, or std::string temporaries will get constructed anyway and you don't care that you're giving your compiler optimiser one extra hurdle to leap to achieve compile or load time construction if there's potential to reuse the same std::string instance (e.g. when passing to functions by const-reference) - again it's rare to need to care.
(another rare and nasty hack) you're somehow leveraging your compiler's string pooling behaviour, e.g. if it guarantees that for any given translation unit the const char* to string literals will only (but of course always) differ if the text differs
- you can't really get the same from std::string .data()/.c_str(), as the same address may be associated with different text (and different std::string instances) during the program execution, and std::string buffers at distinct addresses may contain the same text
you benefit from having the pointer remain valid after a std::string would leave scope and be destroyed (e.g. given enum My_Enum { Zero, One }; - const char* str(My_Enum e) { return e == Zero ? "0" : "1"; } is safe, but const char* str(My_Enum e) { return e == Zero ? "0"s.c_str() : "1"s.c_str(); } isn't and std::string str(My_Enum e) { return e == Zero ? "0"s : "1"s; } smacks of premature pessimism in always using dynamic allocation (sans SSO, or for longer text))
you're leveraging compile-time concatenation of adjacent C-string literals (e.g. "abc" "xyz" becomes one contiguous const char[] literal "abcxyz") - this is particularly useful inside macro substitutions
you're memory constrained and/or don't want to risk an exception or crash during dynamic memory allocation

Discussion

[basic.string.literals] 21.7 lists:

string operator "" s(const char* str, size_t len);

Returns: string{str,len}

Basically, using ""s is calling a function that returns a std::string by value - crucially, you can bind a const reference, or rvalue reference, but not an lvalue reference.

When used to call void foo(std::string arg);, arg will be indeed be move constructed.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.

Doesn't matter much which you choose. Maintenance wise - if foo(const std::string&) is ever changed to foo(const char*), only foo("xyz"); invocations will seamlessly continue working, but there are very few vaguely plausible reasons it might be (so C code could call it too? - but still it'd be a bit mad not to continue to provide a foo(const std::string&) overload for existing client code; so it could be implemented in C? - perhaps; removing dependency on the <string> header? - irrelevant with modern computing resources).

std::cout << "Hello World!" << std::endl; // option 1

std::cout << "Hello World!"s << std::endl; // option 2

The former will call operator<<(std::ostream&, const char*), directly accessing the constant string literal data, with the only disadvantage being that the streaming may have to scan for the terminating NUL. "option 2" would match a const-reference overload and implies construction of a temporary, though compilers might be able to optimise it so they're not doing that unnecessarily often, or even effectively create the string object at compile time (which might only be practical for strings short enough to use an in-object Short String Optimisation (SSO) approach). If they're not doing such optimisations already, the potential benefit and hence pressure/desire to do so is likely to increase.

154

answered Oct 02 '22 03:10

Tony Delroy

First I believe the answer is opinion based!

For your example 1 you already mentioned all important arguments to use the new s literal. And yes, I expect that the result is the same so I see no need to say that I want a std::string in the definition.

One argument can be, that a constructor is defined explicit and a automatic type conversion will not happen. On this condition a s literal is helpful.

But is is a matter of taste I think!

For your example 2 I tend to use the "old" c-string version because generating a std::string object has overhead. Giving a pointer to the string for cout is well defined and I see no use case where I can have some benefit.

So my personal advice is actually ( every day new information is available :-) ) to use c-string if this exactly fit my needs. This means: The string is constant and will never be copied or modified and only used "as is". So a std::string will have simply no benefit.

And using 's'-literal comes in use where I have the need to define it is a std::string.

In a short: I do not use a std::string if I have no need for the additional features which std::string offers over an old c-string. For me the point is not using the s-literal but using std::string vs. c-strings in general.

Only as a remark: I have to program a lot on very small embedded devices, especially also on 8bit AVRs. Using std::string results in a lot overhead. If I have to use a dynamic container because I need the features of this container, it is very good to have one which is very well implemented and tested. But if I have no need for it it is simply to expensive to use it.

On a big target like a x86 box, it seems to be negligible to std::string instead of c-string. But having a small device in mind gives you a an idea what is really happening also on big machines.

Only my two cents!

answered Oct 02 '22 04:10

Klaus

In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

What is or not a good idea tends to vary with the situation.

My choice is to use raw literals whenever they are enough (whenever I don't need anything else than a literal). If I need to access anything else than a pointer to the first element for the string, (the string length, it's back, iterators or anything else) then I use a std::string literal.

In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well.

Uhh ... while the code could indeed throw, this is irrelevant unless in very special circumstances (for example, embedded code running at - or close to - the memory limits of the hardware, or high-availability application/environment).

In practice, I have never had an out of memory condition, from writing auto a = "abdce"s; or other similar code.

In conclusion, don't bother with the exception safety of out-of-memory situations coming from instantiating a std::string literal. If you encounter an out of memory situation doing this, change the code when you find the error.