I recently passed a null pointer to a std::string
constructor and got undefined behavior. I'm certain this is something that thousands or tens of thousands of programmers have done before me, and this same bug has no doubt crashed untold numbers of programs. It comes up a lot when converting from code using char*
to code using std::string
, and it's the kind of thing that is not catchable at compile time and can easily be missed in run time unit tests.
What I'm confused about is the reason for specifying std::string
this way.
Why not just define std::string(NULL)==""
?
The efficiency loss would be negligible, I doubt it's even measurable in a real program.
Does anyone know what the possible reason for making std::string(NULL)
undefined is?
In C++ the std::string is an advancement of that array. There are some additional features with the traditional character array. The null terminated strings are basically a sequence of characters, and the last element is one null character (denoted by '\0').
std::string::data Returns a pointer to an array that contains the same sequence of characters as the characters that make up the value of the string object.
std::string::clear in C++ The string content is set to an empty string, erasing any previous content and thus leaving its size at 0 characters.
Because a null pointer does not point to a meaningful object, an attempt to dereference (i.e., access the data stored at that memory location) a null pointer usually (but not always) causes a run-time error or immediate program crash.
No good reason as far as I know.
Someone just proposed a change to this a month ago. I encourage you to support it.
std::string
is not the best example of well done standardization. The version initially standardized was impossible to implement; the requirements placed on it where not consistent with each other.
At some point that inconsistency was fixed.
In c++11 the rules where changed that prevent COW (copy on write) implementations, which broke the ABI of existing reasonably compliant std::string
s. This change may have been the point where the inconsistency was fixed, I do not recall.
Its API is different than the rest of std
's container because it didn't come from the same pre-std
STL.
Treating this legacy behavior of std::string
as some kind of reasoned decision that takes into account performance costs is not realistic. If any such testing was done, it was 20+ years ago on a non-standard compliant std::string
(because none could exist, the standard was inconsistent).
It continues to be UB on passing (char const*)0
and nullptr
due to inertia, and will continue to do so until someone makes a proposal and demonstrates that the cost is tiny while the benefit is not.
Constructing a std::string
from a literal char const[N]
is already a low performance solution; you already have the size of the string at compile time and you drop it on the ground and then at runtime walk the buffer to find the '\0'
character (unless optimized around; and if so, the null check is equally optimizable). The high performance solution involves knowing the length and telling std::string
about it instead of copying from a '\0'
terminated buffer.
The sole reason is: Runtime performance.
It would indeed be easy to define that std::string(NULL)
results in the empty string. But it would cost an extra check at the construction of every std::string
from a const char *
, which can add up.
On the balance between absolute maximum performance and convenience, C++ always goes for absolute maximum performance, even if this would mean to compromise the robustness of programs.
The most famous example is to not initialize POD member variables in classes by default: Even though in 99% of all cases programmers want all POD member variables to be initialized, C++ decides not to do so to allow the 1% of all classes to achieve slightly higher runtime performance. This pattern repeats itself all over the place in C++. Performance over everything else.
There is no "the performance impact would be negligible" in C++. :-)
(Note that I personally do not like this behavior in C++ either. I would have made it so that the default behavior is safe, and that the unchecked and uninitialized behavior has to be requested explicitly, for example with an extra keyword. Uninitialized variables are still a major problem in a lot of programs in 2018.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With