Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "constant" complexity really mean? Time? Count of copies/moves? [closed]

I can think of three operations in C++ that can be described in some sense as having 'constant' complexity. I've seen some debate(*) over what this means, and it seems to me that we could just say "all these operations are constant, but some are more constant than others" :-)

(Edit 2: If you already think you know the answer, please read some of the debate at this question before rushing in too soon: What data structure, exactly, are deques in C++? Many people, with quite high scores, are arguing over what 'constant' means. My question isn't as obvious as you might think!)

(Edit: I don't need a primer in what 'complexity' means. I want a clear answer, perhaps with quotes from the C++ standard, that tells us exactly what is supposed to be constant. Processor ticks, real-world time, or something else? On other threads, some people have argued that time is totally irrelevant to what is required by the C++ standard.)

Do these complexity guarantees in the standard apply to the runtime of the operation? Or do they simply specify the (maximum) number of copies/moves that might happen of the the contained objects? And what exactly does 'amortized' mean?

1. Given a (non-empty) vector<int> v, the following is clearly constant in runtime:

swap(v.front(), v.back());

(although a pedant might point out that it depends on whether the data is in the cache or swapped out or whatever!).

2. Given a list<int> l, doing a push_back is straightforward. Exactly one new item is allocated and a few pointers in the linked list are shuffled. Each push_front involves one allocation, always of the same amount of memory, so this is clearly fairly 'constant'. But, of course, the time to do the allocation could be quite variable. The memory-management might have to take a lot of time to find some suitable free memory.

3. But doing a push_back on a vector<int> is even more unpredictable. Most of the time, it will be very quick, but every now and then it will have to reallocate space for all the data and copy every element to a new location. So it's less predictable in terms of runtime than a single list::push_front, but it's still referred to as constant (amortized). On average, adding a lot of data to a vector will take a complexity that is independent of the amount added, and this is why it's called 'amortized constant' time. (Am I right?)

And finally, I asked about int to avoid the complexities of having another type. For example, a vector< vector<int> > might be a little more complicated to reason about because each element of the vector (of vectors) might have a different size and, for example, swapping two elements is not quite as constant as in case 1. above. But ideally, we could answer for all vector<T>, not just T=int.

(*) For an example of the debates, see the comments to these answers: What data structure, exactly, are deques in C++?

like image 436
Aaron McDaid Avatar asked Dec 25 '11 20:12

Aaron McDaid


2 Answers

Complexity is always stated relative to a specific variable or set of variables. So when the standard talks about constant time insertion, it is talking about constant time relative to the number of items in the list. That is, O(1) insertions means that the number of items currently in the list does not affect the overall complexity of insertions. The list could have 500 or 50000000 items in it, and the complexity of the insertion operation will be the same.

For example, std::list has O(1) insertions and deletions; the complexity of insertions is not affected by the number of elements in the list. However, the memory allocator's complexity may well depend on the number of things that have already been allocated. But since the O(1) is talking about number of items in the list, it doesn't cover that. Nor is it supposed to, because then we would be measuring the complexity of the memory allocator, not the data structure.

In short: that's a different measurement.

It implies that we are free to implement our algorithm as badly as we like, including one where the time isn't really constant in any pragmatic sense, but where we have respected the number of 'operations' on the contained objects.

Complexity is not specified relative to implementations. It is specified relative to algorithms. It doesn't matter that a context could switch, because running time is not what complexity is for.

As above, you could implement std::list with a memory allocator that is O(log(n)) with respect to deletions (where n is the number of allocations). However, the deletion of an element in the list would still be O(1) with respect to the number of items in the list.

Do not confuse complexity with overall performance. The purpose of complexity is to have a general metric for algorithms with respect to different variables. The purpose of a programmer, who wants their code to run fast, is to find a reasonable implementation of an algorithm that conforms to the complexity that they need to achieve that performance.

Complexity is a tool for rating the efficiency of an algorithm. Complexity does not mean you get to stop thinking.

And what exactly does 'amortized' mean?

Amortized means this:

If something is "amortized X time", then as you repeat the operation infinitely many times on the same data structure, the complexity limits at X.

So, std::vector has "amortized constant time" insertions at the rear. So if we take the object and perform infinitely many insertions on it, the asymptotic limit of complexity will be no different than a "constant time" insertion.

In laymans terms, it means that the operation may sometimes be non-constant, but the number of times it will be non-constant will always be decreasing. Over the long run of insertions, it's constant time.

like image 140
Nicol Bolas Avatar answered Sep 28 '22 08:09

Nicol Bolas


As far as I understand, constant complexity means that the operation is O(1): you can tell in advance how much elementary operations (reads/writes, assembly instructions, whatever) the operation is going to take. And this bound is a common bound for all possible states of the target object. There is a catch here: in a multithreaded environment you cannot predict thread switches, so you can make some reasoning on the elapsed time of operation only in a real-time OS.

About the amortized constant complexity, this is even weaker. Making summary of the answers from here, it's a guarantee that in average your operation is a constant. This means, that the number of elementary operations for N consequent operations is O(N). This means that the number of elementary operations is in about O(1), but allows for some rare jumps. For example, adding an item to the vector's tail is usually constant, but sometimes an additional heavy operation is needed; having amortized constant time here means that the additional operation happens not so often and takes a predictable amount of time, so that the total time of N operation is still O(N). Of course, the same catch applies here as well.

So, answering to your questions:

  1. Complexity guarantees of standard apply really only to the number of machine code instructions needed to execute the operation, and don't imply that the runtime is somehow bounded. (Indeed, until recently C++ didn't even have a language-specified notion of thread, so from the C++ standard point of view by that time, the program was executed on a dedicated C++-machine.)
  2. Amortized is "bounded by a constant in average", which usually happens in the case of almost always constant-bounded operation time, with some quite rare deviations.

Edit:
You can look up at the e.g. section 23.1 of C++ standard:

All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.

like image 34
Vlad Avatar answered Sep 28 '22 07:09

Vlad