Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to get middle (median) of an std::set?

Tags:

c++

set

stl

median

std::set is a sorted tree. It provides begin and end methods so I can get minimum and maximum and lower_bound and upper_bound for binary search. But what if I want to get iterator pointing to the middle element (or one of them if there are even number of elements there)?

Is there an efficient way (O(log(size)) not O(size)) to do that?

{1} => 1
{1,2} => 1 or 2
{1,2,3} => 2
{1,2,3,4} => 2 or 3 (but in the same direction from middle as for {1,2})
{1,312,10000,14000,152333} => 10000

PS: Same question in Russian.

like image 452
Qwertiy Avatar asked Nov 19 '17 11:11

Qwertiy


People also ask

How to get middle value in set c++?

size() / 2); int median = *it; If you want to include duplicates when considering the median you can use std::multiset ( {1, 2, 3, 3, 3, 3, 3, 3, 3} median's would be 3 ) : std::multiset<int>::iterator it = s.

How to find middle element in multiset in c++?

There's no other way, unless you know in advance what the middle element is, or you use a different data structure. It is possible, but non portable solution with hacking of encapsulation. E.g. gcc std::multiset<> uses _Rb_tree where _M_begin() just points on the middle of set.

How do you find the elements of a set?

You can't access set elements by index. You have to access the elements using an iterator. set<int> myset; myset. insert(100); int setint = *myset.

What is a multiset in C++?

Multisets are a type of associative containers similar to the set, with the exception that multiple elements can have the same values. Some Basic Functions associated with multiset: begin() – Returns an iterator to the first element in the multiset –> O(1)


2 Answers

Depending on how often you insert/remove items versus look up the middle/median, a possibly more efficient solution than the obvious one is to keep a persistent iterator to the middle element and update it whenever you insert/delete items from the set. There are a bunch of edge cases which will need handling (odd vs even number of items, removing the middle item, empty set, etc.), but the basic idea would be that when you insert an item that's smaller than the current middle item, your middle iterator may need decrementing, whereas if you insert a larger one, you need to increment. It's the other way around for removals.

At lookup time, this is of course O(1), but it also has an essentially O(1) cost at each insertion/deletion, i.e. O(N) after N insertions, which needs to be amortised across a sufficient number of lookups to make it more efficient than brute forcing.

like image 68
pmdj Avatar answered Sep 20 '22 14:09

pmdj


This suggestion is pure magic and will fail if there are some duplicated items

Depending on how often you insert/remove items versus look up the middle/median, a possibly more efficient solution than the obvious one is to keep a persistent iterator to the middle element and update it whenever you insert/delete items from the set. There are a bunch of edge cases which will need handling (odd vs even number of items, removing the middle item, empty set, etc.), but the basic idea would be that when you insert an item that's smaller than the current middle item, your middle iterator may need decrementing, whereas if you insert a larger one, you need to increment. It's the other way around for removals.

Suggestions

  1. first suggestion is to use a std::multiset instead of std::set, so that it can work well when items could be duplicated
  2. my suggestion is to use 2 multisets to track the smaller potion and the bigger potion and balance the size between them

Algorithm

1. keep the sets balanced, so that size_of_small==size_of_big or size_of_small + 1 == size_of_big

void balance(multiset<int> &small, multiset<int> &big)
{
    while (true)
    {
        int ssmall = small.size();
        int sbig = big.size();

        if (ssmall == sbig || ssmall + 1 == sbig) break; // OK

        if (ssmall < sbig)
        {
            // big to small
            auto v = big.begin();
            small.emplace(*v);
            big.erase(v);
        }
        else 
        {
            // small to big
            auto v = small.end();
            --v;
            big.emplace(*v);
            small.erase(v);
        }
    }
}

2. if the sets are balanced, the medium item is always the first item in the big set

auto medium = big.begin();
cout << *medium << endl;

3. take caution when add a new item

auto v = big.begin();
if (v != big.end() && new_item > *v)
    big.emplace(new_item );
else
    small.emplace(new_item );

balance(small, big);

complexity explained

  • it is O(1) to find the medium value
  • add a new item takes O(log n)
  • you can still search a item in O(log n), but you need to search 2 sets
like image 43
Clark Avatar answered Sep 19 '22 14:09

Clark