Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest insert C++

Tags:

c++

arrays

vector

Is there any way to insert data in any position in an array/vector pretty fast, given the array/vector is very large?

If I use vector::insert the vector will move all the items after my item and this will take much time, if for example the vector got 1b items and this is performed at the middle of the vector, the vector will move 500m items.

Is there any efficient way to do this with C style/C++ array or Vector?

like image 812
Luka Avatar asked Jan 06 '14 23:01

Luka


2 Answers

C++ vectors are really arrays with smart resizing, if you need to insert at the front of a container you need to consider something else like a linked list. However, have in mind that if performance is very critical to you then such other containers are usually not contiguous in memory so will often incur memory paging issues as you traverse them and jump around in memory. It's always a trade off.

like image 26
naumcho Avatar answered Nov 18 '22 11:11

naumcho


Theory (Specifically Big O notation) says that linked lists have O(1) complexity for insertion and removal, and dynamic arrays have O(n), due to the container shifting that the erase/insert needs.

But thats theory. Computers are not only theory. In practice its far different.

Modern computers have their memory arranged in a so called memory-hierarchy, that is, a set o different types of memory devices sorted by its speed:

+---------------+
| CPU registers |   ^
+---------------+   |
|   L1 cache    |   |
|      ...      |   |   Less apacity
|   LN cache    |   |   Faster access
+---------------+   |   More expensive hardware
|      RAM      |   |
+---------------+   
|      HDD      |
+---------------+

As the diagram shows, the hierarchy is organized through the speed of the memory access. But note that more speed means more expensive hardware, so this translates to many slow-access memory and a few fast memory.
So one way to boost your program performance is to hold frecuently used data in fast memory, and only go to the slow memory when is necessary (The program requests data which is not loaded in the fast memory, for example).

Thats exactly what the hardware does, supposing two behaviors:

  • If a piece of data is requested, probably data near to it will be requested in the future. This is called Locality of reference. Consider how we use arrays, for example.
  • If a piece of data is requested, probably it will be requested again in the future. Thats called temporal locality. Again, iterating over and array again and again is an example of that.

Of course memory is limited, so a request of new data which will be loaded in a certain level of the hierarchy discards the data which was there before.

So, why is this important for the performance of different containers?

Remember how a linked-list (std::list is a linked-list) woks:
A linked list is a chain of separated nodes conected between them through pointers:

+---+     +---+             +---+
| 1 | --> | 2 | --> ... --> | N |
+---+     +---+             +---+

On the other hand, a dynamic array (std::vector is a dynamic array) is a continuous chunk of memory:

+---+---+-----+---+
| 1 | 2 | ... | N |
+---+---+-----+---+

As I said above, the theory sais that linked-list insertion/removal has O(1) complexity, because "is just changing pointers". But consider how do you access the memory to do that. Have you notices that process does not fulfill the space-locality rule?. So this has a lot of misses (cache misses), that is, new memory out of the fast memory is requested, and performance goes down.
In fact, even if theory says that transversing a linked-list has O(n) complexity, in practice together with the complexity are the continous performance hits due to the cache misses.

Now consider how a dynamic array works: Its true that inserting/removing has O(n) complexity, because you have to shift one position one side of the array to leave a gap for the new element, or eliminate that gap if you are erasing.
But remember that an array is a contiguous block of memory, so if you are using it, probably that array is totally (Or almost partially) loaded into the fast memory (sapce-locality), so the shiftting process is really fast.

So as you can see, the dynamic array is far faster than the linked list in modern architectures.

In general, std::vector has many advantages opposed to std::list:

  • Its cache-frindly. As we have discussed, std::list is not, and "cache-friendliness" its very important for performance nowadays. This leads to fast insertion/removal, fast random access, and fast copying.
  • Linked-lists do one dynamic-memory operation per insertion/deletion. Calls to malloc()/free() (That is, calls to the OS to retrieve/leave heap memory) take a lot of time. On the other hand, dynamic arrays only do de/allocations in a O(logn) average.

But thats not all: In a few cases, std::list must be preferred, cases where the cost of copying/moving elements is very high. The point of the std::vector performance is based on the cheap shifting process done at the cache, but if the elements of the vector have certain expensive copy/move semantics, there is no benefit at all and the list perform better than the vector.

Read this article for more information about the topic.

like image 143
Manu343726 Avatar answered Nov 18 '22 11:11

Manu343726