Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ string functions - Why do they often have unspecified complexity?

I just noticed that, according to the most recent C++ ISO standard, the string::pop_back and string::erase (to name two, there are probably others) member functions have unspecified complexity. What is the reasoning behind leaving the complexity a choice of the library coders? Are there actually any implementations of which anyone knows that have non-constant complexity for string::pop_back?

like image 777
user904963 Avatar asked Oct 06 '13 21:10

user904963


2 Answers

TLDR; Because history, and that time complexities haven't been proposed yet

The situation:

[basic.string] only directly specifies that a few operations have a certain complexity:

  • size() : O(1) since C++11
  • max_size() : O(1) since C++11
  • shrink_to_fit() : O(n) in C++17 (although added in C++11)
  • operator[] : O(1) since C++11
  • swap() : O(1)
  • data() : O(1) since C++11

It indirectly specifies many more (and I'm probably missing a few here):

  • length() : equal to size()
  • empty() : equal to size()
  • at() : equal to operator[]
  • front() : equal to operator[]
  • back() : equal to operator[]
  • copy() : O(n) (comes from its character traits)
  • assign() : O(n) (comes from its character traits)
  • find() and variations : O(n) (comes from its character traits)
  • append(char* s) : O(m) where m is the length of s (comes from its character traits)

The requirement on data() here is key. Before C++11, the underlying storage for a string was implementation-defined. It could have been a linked list for all that matters, so long as it could be translated into a c-style array when needed. After C++11, the underlying implementation was still platform dependent, but the requirement that data() be O(1) all but guaranteed that storage of the string was in a contiguous C-style array (no more lazily copying your linked list). In C++17, data was overloaded to return a non-const character array that you could modify, and such modifications would be the same as using operator[] to do the same, which further solidified the implementation (FWIW the storage is still implementation-dependent, but there's no way to satisfy the time complexity requirements otherwise).

You can see that C++03's only performance requirement was on swap, and this reflects the philosophy of the standard for a long time; prefer to specify only the behavior of an object and let the implementations take care of performance guarantees. This gives library implementers more flexibility to optimize for whatever they saw fit.

What's happened historically

When you dig into the proposals that added the complexity wordings (like n2923: Specifying the complexity of size()) you'll see that these complexity requirements are getting added piecemeal as people consider them.

Heck, the non-const data() was added in C++17 simply because it wasn't proposed before (link to std discussion on the matter (really just links back to an answer our friend Howard Hinnant posted on StackOverflow))

Copy-On-Write

Int n8215, the author discusses basic_string in detail, citing copy-on-write as one of the initial philosophies behind its design (although it didn't quite get there)

On the other hand, the current specifications of basic_string are intended to allow reference counted strings for which copy-on-write (COW) is essential.

(Wikipedia, citing Scott Meyers agrees).

If the standard was going to allow copy-on-write, it made sense to not make performance guarantees in the standard because writes to a string could either be O(1) or O(N). O(N) would be correct, I suppose, but it's a loose bound that could be misleading.

In fact, Daniel Krügler wondered the same thing as you in LWG 876: basic_string access operations should give stronger guarantees and also cited copy-on-write as a vestige:

Due to earlier support for copy-on-write, we find the following unnecessary limitations for C++0x:
... 2. Missing complexity guarantees: data() and c_str() simply return a pointer to their guts, which is guaranteed O(1). This should be spelled out.


So it looks like it's a mixture of allowing implementors flexibility, a discarded idea of supporting copy-on-write strings, and just a lack of people thinking about it.

Feel free to propose time complexities for basic_string functions where they are missing. The standard may be better off for it :-)

like image 130
AndyG Avatar answered Oct 24 '22 01:10

AndyG


A long time ago, there was the idea that you could implement std::string using ropes. The SGI STL even contained a rope template with a very string-like interface.

But it turned out that people used the subscript operator and the c_str() method a lot, more than efficient concatenation, which is why rope-based implementations of std::string were not competitive in practice.

like image 33
Florian Weimer Avatar answered Oct 24 '22 01:10

Florian Weimer