Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does it matter that the insert hint place which is close the final place is before or after the final place?

Tags:

c++

c++11

I am using the function insert with hint (emplace_hint) in the set/map.

The api doc said that when using hint place it will "begin search the final place from the hint palce and speeding up the process considerably when the actual insertion point is either position or close to it."

I want to know does the close here means before, after or both, and how to effectivly use this feature?

If use lower_bound or upper_bound to find the close place handbefore, it seems not speedup the process.

like image 824
imhuay Avatar asked Apr 04 '18 14:04

imhuay


People also ask

How do you insert a clip in Final Cut?

In Final Cut Pro, select one or more clips in the browser. Move the playhead to the point in the primary storyline or in a selected storyline (or make a range selection) where you want to insert the clip. Do one of the following: Choose Edit > Insert (or press W).

How do you insert notes in Word?

Insert a comment Select the text you want to comment on, or click at the end of the text. On the Review tab, click New Comment. Type your comment. Word shows your comment in a balloon in the document's margin.


1 Answers

The bad news...

We call these types map/set, but what we really mean is tree<pair>/tree. The insert operation on a tree is a lower_bound O(log(N)), followed by the operation to actually add the new value. (Normally the tree is an RB tree so adding may involve a "rotation")

By calling lower_bound then inserting, you are simply implementing insert function the hard way. There is no way this could ever be faster. If it was, we would be asking why this wasn't the default implementation.

The good news...

If what you are really asking is ... "How do I go faster than a map". This is easy. There are a couple of choices, depending upon what is slower - access or insertion/deletion, how many elements are stored, how big the key/value types are, or if there are special reasons why key/value are expensive to move, etc.

1) Hash map, implementation in std as unordered_map This is a technique sometimes called bucket sorting. What you do is supply a function that turns the key into a number. Then you mod the number and use it to look-up into an array of maps. A lot of people mistaken believe means it gives linear access, it doesn't, it is still O(log(N)), just with a smaller N. The downside is it uses more memory. Obviously the hash function needs to be fast too.

2)
For faster lookup, consider a sorted-vector (sometimes called a flat-map, see boost::flatmap). It is simply a std::vector<pair<key,value>>, just sort'ed. Gloriously simple this structure is much faster than map for lookups. It works because the lower_bound function is also a general algorithm, and is still log(N), but vectors are much more cache friendly. Expect it to go around 4 times faster.
Insertion/deletion are O(N) because you have to memcpy memory around....but, as stated above, insertion requires a lower_bound call, or equivalent, and this is faster with sorted-vectors. By experimentation, I have found that vector is faster than set/map for insertion/deletion for structures of less than 4K, but after that it becomes slower, but this is obviously H/W dependent.

3) Don't insert/delete, use a sorted vector as above, but do not call insert/erase, instead simply push_back to insert. To remove, swap the item with back() and resize. This means the sorting will break everytime you insert/erase, so you need a dirty flag. When you do a look-up, you check the dirty flag and if its set, you sort. The std::sort algorithm is real piece of science, it's awesomely fast. I mean really FAST... The only downside is that it is N*log(N), so for large data sets (1000's of elements, or more) you need multiple inserts/erase to pay off, although not as many as people suspect. It's also getting complex enough that you probably need a class/template to abstract it.

For fastest insert/delete... 4) The king of all sorting algorithms, is a radix sort. Implementing 3) with a radix-sort gives you O(N) sorting time. Downside, radix algorithms generally need very large datasets to be worth while, (>10,000 elements typically), which is why they are not normally used. For smaller data-sets you normally find std:sort wins. Plus, I do not know a standard radix sort implementation, so it means writing your own, or a lot of googling. Typically, radix sorting only supports integer keys (although there is no inherent reason why it can't be extended).

For the hackers...
5) set<float&ft; or map<float,X> implies floating point comparisons. Now a real hack is to use an integer compare on floating-point data (as in alias-cast, cast a float* to int* in the compare function). This works because floating point numbers store the exponent in the MSB, so a larger number has a larger exponent. The only problem is the sign flag, it's the wrong way around, so be aware if you ever iterate over the structure, the positive values are sorted low to high. Negative numbers are sorted high to low, meaning that the map isn't really sorted. If you only have either +ve or -ve numbers this doesn't matter. If you only insert/erase/lookup then the absolute storage order isn't important.
(If you want to know why integer comparison is faster, it's because its more complicated, also often CPU's have separate registers for floating point, and access to those registers is slower). Also note, that alias-cast'ing sometimes breaks compiler optimisations, depending on compiler you might need to tell the compiler what your doing. You do this using a compiler flag. On GCC this is no_strict_aliasing, Visual Studio on the other hand doesn't care, and should be fine without)
...
and so on, the point is there are lots of ways to beat the "standard" algorithms, because the standard algorithms involve trading-off pro-and-cons, to balance all cases. If you only have one, known case, you can often beat them. But... if you don't have specific knowledge of what it is you are storing, or how many, then trying to beat the std::XXXXX is a suckers game. The standard algorithms are implemented by people who have been doing this for decades, and know what they are doing. That is why what you trying is silly, if it was that easy, the std::XXXX version would already do it.

like image 50
Tiger4Hire Avatar answered Nov 15 '22 08:11

Tiger4Hire