So C++ 14 introduced a number of user-defined literals to use, one of which is the "s" literal suffix, for creating std::string
objects. According to the documentation, its behavior is exactly the same as constructing an std::string
object, like so:
auto str = "Hello World!"s; // RHS is equivalent to: std::string{ "Hello World!" }
Of course constructing an unnamed std::string
object could be done prior to C++ 14, but because the C++ 14 way is so much simpler, I think way more people will actually consider constructing std::string
objects on the spot than before, that's why I thought it makes sense to ask this.
So my question is simple: In what cases it's a good (or bad) idea construct an unnamed std::string
object, instead of simply using a C-style string literal?
Consider the following:
void foo(std::string arg);
foo("bar"); // option 1
foo("bar"s); // option 2
If I'm correct, the first method will call the appropriate constructor overload of std::string
to create an object inside foo
's scope, and the second method will construct an unnamed string object first, and then move-construct foo
's argument from that. Although I'm sure that compilers are very good at optimizing stuff like this, but still, the second version seems like it involves an extra move, as opposed to the first alternative (not like a move is expensive of course). But again, after compiling this with a reasonable compiler, the end results are most likely to be highly optimized, and free of redundand moves/copies anyway.
Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s)
, but I could be wrong.
Consider the following:
std::cout << "Hello World!" << std::endl; // option 1
std::cout << "Hello World!"s << std::endl; // option 2
In this case, the std::string
object is probably passed to cout
's operator via rvalue reference, and the first option passes a pointer probably, so both are very cheap operations, but the second one has the extra cost of constructing an object first. It's probably a safer way to go though (?).
In all cases of course, constructing an std::string
object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well. This is more of an issue in the second example though, as in the first example, an std::string
object will be constructed in both cases anyway. In practice, getting an exception from constructing a string object is very unlikely, but still could be a valid argument in certain cases.
If you can think of more examples to consider, please include them in your answer. I'm interested in a general advice regarding the usage of unnamed std::string
objects, not just these two particular cases. I only included these to point out some of my thoughts regarding this topic.
Also, if I got something wrong, feel free to correct me as I'm not by any means a C++ expert. The behaviors I described are only my guesses on how things work, and I didn't base them on actual research or experimenting really.
In general, we should use the String literal notation when possible. It is easier to read and it gives the compiler a chance to optimize our code.
std::string is compatible with STL algorithms and other containers. C strings are not char * or const char * ; they are just null-terminated character arrays. Even string literals are just character arrays.
C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.
Thus, std::string is a literal type.
In what cases it's a good (or bad) idea construct an unnamed
std::string
object, instead of simply using a C-style string literal?
A std::string
- literal is a good idea when you specifically want a variable of type std::string
, whether for
modifying the value later (auto s = "123"s; s += '\n';
)
the richer, intuitive and less error-prone interface (value semantics, iterators, find
, size
etc)
==
, <
copying etc. work on the values, unlike the pointer/by-reference semantics after C-string literals decay to const char*
scalling some_templated_function("123"s)
would concisely ensure a <std::string>
instantiation, with the argument being able to be handled using value semantics internally
std::string
anyway, and it's of significant complexity relative to your resource constraints, you might want to pass a std::string
too to avoid unnecessarily instantiation for const char*
too, but it's rare to need to carevalues containing embedded NUL
s
A C-style string literal might be preferred where:
pointer-style semantics are wanted (or at least not a problem)
the value's only going to be passed to functions expecting const char*
anyway, or std::string
temporaries will get constructed anyway and you don't care that you're giving your compiler optimiser one extra hurdle to leap to achieve compile or load time construction if there's potential to reuse the same std::string
instance (e.g. when passing to functions by const
-reference) - again it's rare to need to care.
(another rare and nasty hack) you're somehow leveraging your compiler's string pooling behaviour, e.g. if it guarantees that for any given translation unit the const char*
to string literals will only (but of course always) differ if the text differs
std::string
.data()
/.c_str()
, as the same address may be associated with different text (and different std::string
instances) during the program execution, and std::string
buffers at distinct addresses may contain the same textyou benefit from having the pointer remain valid after a std::string
would leave scope and be destroyed (e.g. given enum My_Enum { Zero, One };
- const char* str(My_Enum e) { return e == Zero ? "0" : "1"; }
is safe, but const char* str(My_Enum e) { return e == Zero ? "0"s.c_str() : "1"s.c_str(); }
isn't and std::string str(My_Enum e) { return e == Zero ? "0"s : "1"s; }
smacks of premature pessimism in always using dynamic allocation (sans SSO, or for longer text))
you're leveraging compile-time concatenation of adjacent C-string literals (e.g. "abc" "xyz"
becomes one contiguous const char[]
literal "abcxyz"
) - this is particularly useful inside macro substitutions
you're memory constrained and/or don't want to risk an exception or crash during dynamic memory allocation
[basic.string.literals] 21.7 lists:
string operator "" s(const char* str, size_t len);
Returns:
string{str,len}
Basically, using ""s
is calling a function that returns a std::string
by value - crucially, you can bind a const
reference, or rvalue reference, but not an lvalue reference.
When used to call void foo(std::string arg);
, arg
will be indeed be move constructed.
Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.
Doesn't matter much which you choose. Maintenance wise - if foo(const std::string&)
is ever changed to foo(const char*)
, only foo("xyz");
invocations will seamlessly continue working, but there are very few vaguely plausible reasons it might be (so C code could call it too? - but still it'd be a bit mad not to continue to provide a foo(const std::string&)
overload for existing client code; so it could be implemented in C? - perhaps; removing dependency on the <string>
header? - irrelevant with modern computing resources).
std::cout << "Hello World!" << std::endl; // option 1
std::cout << "Hello World!"s << std::endl; // option 2
The former will call operator<<(std::ostream&, const char*)
, directly accessing the constant string literal data, with the only disadvantage being that the streaming may have to scan for the terminating NUL. "option 2" would match a const
-reference overload and implies construction of a temporary, though compilers might be able to optimise it so they're not doing that unnecessarily often, or even effectively create the string object at compile time (which might only be practical for strings short enough to use an in-object Short String Optimisation (SSO) approach). If they're not doing such optimisations already, the potential benefit and hence pressure/desire to do so is likely to increase.
First I believe the answer is opinion based!
For your example 1 you already mentioned all important arguments to use the new s
literal. And yes, I expect that the result is the same so I see no need to say that I want a std::string in the definition.
One argument can be, that a constructor is defined explicit
and a automatic type conversion will not happen. On this condition a s
literal is helpful.
But is is a matter of taste I think!
For your example 2 I tend to use the "old" c-string version because generating a std::string object has overhead. Giving a pointer to the string for cout is well defined and I see no use case where I can have some benefit.
So my personal advice is actually ( every day new information is available :-) ) to use c-string if this exactly fit my needs. This means: The string is constant and will never be copied or modified and only used "as is". So a std::string will have simply no benefit.
And using 's'-literal comes in use where I have the need to define it is a std::string.
In a short: I do not use a std::string if I have no need for the additional features which std::string offers over an old c-string. For me the point is not using the s-literal but using std::string vs. c-strings in general.
Only as a remark: I have to program a lot on very small embedded devices, especially also on 8bit AVRs. Using std::string results in a lot overhead. If I have to use a dynamic container because I need the features of this container, it is very good to have one which is very well implemented and tested. But if I have no need for it it is simply to expensive to use it.
On a big target like a x86 box, it seems to be negligible to std::string instead of c-string. But having a small device in mind gives you a an idea what is really happening also on big machines.
Only my two cents!
In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?
What is or not a good idea tends to vary with the situation.
My choice is to use raw literals whenever they are enough (whenever I don't need anything else than a literal). If I need to access anything else than a pointer to the first element for the string, (the string length, it's back, iterators or anything else) then I use a std::string literal.
In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well.
Uhh ... while the code could indeed throw, this is irrelevant unless in very special circumstances (for example, embedded code running at - or close to - the memory limits of the hardware, or high-availability application/environment).
In practice, I have never had an out of memory condition, from writing auto a = "abdce"s;
or other similar code.
In conclusion, don't bother with the exception safety of out-of-memory situations coming from instantiating a std::string literal. If you encounter an out of memory situation doing this, change the code when you find the error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With