Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

emplace_hint performance when hint is wrong

Tags:

c++

multimap

I am trying to determine if emplace_hint should be used to insert a key into a multimap (as opposed to regular emplace). I have already calculated the range of the key in an earlier operation (on the same key):

range = multimap.equal_range(key); 

Should I use range.first, range.second, or nothing as a hint to insert the key, value pair? What if the range is empty?

like image 658
user9866374 Avatar asked Jun 04 '18 18:06

user9866374


2 Answers

Should I use range.first, range.second, or nothing as a hint to insert the key, value pair?

As std::multimap::emplace_hint() states:

Inserts a new element into the container as close as possible to the position just before hint.

(emphasis is mine) you should use second iterator from range and it should make insertion more efficient:

Complexity

Logarithmic in the size of the container in general, but amortized constant if the new element is inserted just before hint.

as for empty range, it is still fine to use second iterator as it should always point to greater than element or behind the last if not such one exists.

like image 129
Slava Avatar answered Nov 13 '22 06:11

Slava


First, performance wise, it will not make any difference if you use range.first or range.second. Let's have a look at the return value of equal_range:

std::equal_range - return value

std::pair containing a pair of iterators defining the wanted range, the first pointing to the first element that is not less than value and the second pointing to the first element greater than value. If there are no elements not less than value, last is returned as the first element. Similarly if there are no elements greater than value, last is returned as the second element

This means that - when obtained for a value key - both range.first and range.secod are represent positions wherekeymay be correctly inserted right before. So performance wise it should not matter if you userange.firstorrange.last`. Complexity should be "amortized constant", since the new element is inserted just before hint.

Second, when the range is "empty", range.first and range.second are both one-past-the-end, and therefore performance as well as result are identical, actually the same as if you used emplace without any hint.

See the following program demonstrating this:

int main()
{
    std::multimap<std::string, std::string> m;

    // some clutter:
    m.emplace(std::make_pair(std::string("k"), std::string("1")));
    m.emplace(std::make_pair(std::string("k"), std::string("2")));
    m.emplace(std::make_pair(std::string("z"), std::string("1")));
    m.emplace(std::make_pair(std::string("z"), std::string("2")));

    // relevant portion of demo data: order a-c-b may be preserved
    m.emplace(std::make_pair(std::string("x"), std::string("a")));
    m.emplace(std::make_pair(std::string("x"), std::string("c")));
    m.emplace(std::make_pair(std::string("x"), std::string("b")));


    auto r = m.equal_range("x");
    // will insert "x.zzzz" before "x.a":
    m.emplace_hint(r.first, std::make_pair(std::string("x"), std::string("zzzz")));

    // will insert "x.0" right after "x.b":
    m.emplace_hint(r.second, std::make_pair(std::string("x"), std::string("0")));

    auto rEmpty = m.equal_range("e");
    // "empty" range, normal lookup:
    m.emplace_hint(rEmpty.first, std::make_pair(std::string("e"), std::string("b")));
    m.emplace_hint(rEmpty.second, std::make_pair(std::string("e"), std::string("a")));

    auto rWrong = m.equal_range("k");
    m.emplace_hint(rWrong.first, std::make_pair(std::string("z"), std::string("a")));

    for (const auto &p : m) {
        std::cout << p.first << " => " << p.second << '\n';
    }
}

Output:

e => b
e => a
k => 1
k => 2
x => zzzz
x => a
x => c
x => b
x => 0
z => a
z => 1
z => 2

In short: if you have a valid range for key pre-calculated, then use it when inserting key. It will help anyway.

EDIT:

There have been discussions around whether an "invalid" hint might lead to an insertion at a position that does not then reflect the "order of insertion" for values with the same key. This might be concluded from a general multimap statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

I did not find support for the one or the other point of view in any normative document. I just found the following statement in cplusplus multimap/emplace_hint documentation:

emplate <class... Args>
  iterator emplace_hint (const_iterator position, Args&&... args);

position Hint for the position where the element can be inserted. The function optimizes its insertion time if position points to the element that will follow the inserted element (or to the end, if it would be the last). Notice that this does not force the new element to be in that position within the multimap container (the elements in a multimap always follow a specific order). const_iterator is a member type, defined as a bidirectional iterator type that points to elements.

I know that this is not a normative reference, but at least my Apple LLVM 8.0 compiler adheres to this definition (see demo above): If one inserts an element with a "wrong" hint, i.e. one pointing even before the position where a pair shall be inserted, the algorithm recognizes this and chooses a valid position (see inserting "z"=>"a" where a hint points to an "x"-element). If we use a range for key "x" and use range.first, the position right before the first x is interpreted as a valid position.

So: I think that m.emplace_hint(r.first,... behaves in a way that the algorithm chooses a valid position immediately, and that to a position close to hint overrules the general statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

like image 2
Stephan Lechner Avatar answered Nov 13 '22 06:11

Stephan Lechner