I just noticed that, according to the most recent C++ ISO standard, the string::pop_back
and string::erase
(to name two, there are probably others) member functions have unspecified complexity. What is the reasoning behind leaving the complexity a choice of the library coders? Are there actually any implementations of which anyone knows that have non-constant complexity for string::pop_back
?
[basic.string] only directly specifies that a few operations have a certain complexity:
size()
: O(1) since C++11max_size()
: O(1) since C++11shrink_to_fit()
: O(n) in C++17 (although added in C++11)operator[]
: O(1) since C++11swap()
: O(1)data()
: O(1) since C++11It indirectly specifies many more (and I'm probably missing a few here):
length()
: equal to size()
empty()
: equal to size()
at()
: equal to operator[]
front()
: equal to operator[]
back()
: equal to operator[]
copy()
: O(n) (comes from its character traits)assign()
: O(n) (comes from its character traits)find()
and variations : O(n) (comes from its character traits)append(char* s)
: O(m) where m
is the length of s
(comes from its character traits)The requirement on data()
here is key. Before C++11, the underlying storage for a string was implementation-defined. It could have been a linked list for all that matters, so long as it could be translated into a c-style array when needed. After C++11, the underlying implementation was still platform dependent, but the requirement that data()
be O(1) all but guaranteed that storage of the string was in a contiguous C-style array (no more lazily copying your linked list). In C++17, data
was overloaded to return a non-const character array that you could modify, and such modifications would be the same as using operator[]
to do the same, which further solidified the implementation (FWIW the storage is still implementation-dependent, but there's no way to satisfy the time complexity requirements otherwise).
You can see that C++03's only performance requirement was on swap
, and this reflects the philosophy of the standard for a long time; prefer to specify only the behavior of an object and let the implementations take care of performance guarantees. This gives library implementers more flexibility to optimize for whatever they saw fit.
When you dig into the proposals that added the complexity wordings (like n2923: Specifying the complexity of size()) you'll see that these complexity requirements are getting added piecemeal as people consider them.
Heck, the non-const data()
was added in C++17 simply because it wasn't proposed before (link to std discussion on the matter (really just links back to an answer our friend Howard Hinnant posted on StackOverflow))
Int n8215, the author discusses basic_string
in detail, citing copy-on-write as one of the initial philosophies behind its design (although it didn't quite get there)
On the other hand, the current specifications of basic_string are intended to allow reference counted strings for which copy-on-write (COW) is essential.
(Wikipedia, citing Scott Meyers agrees).
If the standard was going to allow copy-on-write, it made sense to not make performance guarantees in the standard because writes to a string could either be O(1) or O(N). O(N) would be correct, I suppose, but it's a loose bound that could be misleading.
In fact, Daniel Krügler wondered the same thing as you in LWG 876: basic_string access operations should give stronger guarantees and also cited copy-on-write as a vestige:
Due to earlier support for copy-on-write, we find the following unnecessary limitations for C++0x:
... 2. Missing complexity guarantees:data()
andc_str()
simply return a pointer to their guts, which is guaranteed O(1). This should be spelled out.
So it looks like it's a mixture of allowing implementors flexibility, a discarded idea of supporting copy-on-write strings, and just a lack of people thinking about it.
Feel free to propose time complexities for basic_string functions where they are missing. The standard may be better off for it :-)
A long time ago, there was the idea that you could implement std::string
using ropes. The SGI STL even contained a rope
template with a very string
-like interface.
But it turned out that people used the subscript operator and the c_str()
method a lot, more than efficient concatenation, which is why rope-based implementations of std::string
were not competitive in practice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With