Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert a new value to a set and erase another at the same time?

Tags:

c++

c++11

set

stl

Each set contains elements in a specified order. I want to specify a bound on the size of a set and automatically delete the last element if a new one strictly smaller (in terms of the order) is inserted and the specified size is already reached.

Of course, I could do something like the following:

class bounded_set
{
private:
    using set = std::set<Key, Compare, Allocator>;
    using iterator = typename set::iterator;

public:
    bounded_set(std::size_t size)
        : m_size(size)
    { }

    std::pair<iterator, bool> insert(Key const& value)
    {
        if (m_set.size() < m_size)
            return m_set.insert(value);

        auto last = std::prev(m_set.end());
        if (Compare()(value, *last))
        {
            m_set.erase(last);
            return m_set.insert(value);
        }
        return std::make_pair(last, false);
    }

private:
    set m_set;
    std::size_t m_size;
};

Besides the fact, that bounded_set is not the best name (since bounded containers are a well known thing in the world of concurrent programming), I'm worried about memory allocation in this implementation. Most likely, at first, the space used by last will be freed. But immediately after that, new memory needs to be allocated for value.

What I really would like to do, is using the memory allocated for last and copy over the data for value to this place, while preserving the order.

like image 202
0xbadf00d Avatar asked Oct 07 '14 23:10

0xbadf00d


2 Answers

If I'm understanding your questions correctly, depending on how the underlying data-structure works, that's not necessarily going to be possible without you having to write a custom memory allocator or use one from a library. For instance, std::set uses a red-black tree as the underlying data-structure. Thus the memory location of the nodes and the relational pointers to-and-from those nodes are intrinsically attached to the total order of the tree. You can't re-use the memory from a node that is the "least" value and place another value there that is not a new totally-ordered "least" value without also re-sorting all the pointers to that node so that it's in the proper place in the tree for the value of that node.

If you're still concerned about memory usage and want to stick with the STL, rather than std::set, maybe you should look into a fixed-length priority-queue or something of that nature that uses an array-based heap as the underlying data-structure so that memory is not constantly allocated and re-allocated for new nodes.

like image 163
Jason Avatar answered Sep 29 '22 10:09

Jason


I see a couple of options for you, and a lost opportunity by the standards committee that would have easily solved your problem.

N3586 proposed a solution to your problem.

std::pair<iterator, bool> insert(Key const& value)
{
    if (m_set.size() < m_size)
        return m_set.insert(value);

    auto last = std::prev(m_set.end());
    if (Compare()(value, *last))
    {
        auto temp = m_set.remove(last);
        *temp = value;
        return m_set.insert(temp);
    }
    return std::make_pair(last, false);
}

In this hypothetical rewrite, temp is a node_ptr that allows non-const access to the node's value_type. You can remove the node, write to it, and re-insert it, all without any allocation for nodes.

The committee politely declined this proposal.

A custom allocator for std::set could do the trick in a less elegant manner. Such an allocator would simply cache nodes, and your existing insert will just work. One slight disadvantage with this approach is that while the custom allocator keeps your node from being deallocated, it can not keep your Key from being destructed, and then constructed, when you change it. Some types are more efficient in assignment, than they are in a destruction-construction cycle. And sometimes the former can be noexcept while the latter can not be.

In all, I view the custom allocator approach as a last resort. You can make it work. But it takes some carefully planned, and non-intuitive code.

The use of push_heap, pop_heap comes to mind. However its use is awkward if you really need an iterator to the inserted or equal element returned. If you can deal with a void return type, it might look like:

void insert(Key const& value)
{
    if (m_set.size() < m_size)
    {
        m_set.push_back(value);
        std::push_heap(m_set.begin(), m_set.end(), Compare{});
    }

    if (Compare()(value, m_set.front()))
    {
        std::pop_heap(m_set.begin(), m_set.end(), Compare{});
        m_set.back() = value;
        std::push_heap(m_set.begin(), m_set.end(), Compare{});
    }
}

But it is awkward to search the heap for the newly inserted value, and push_heap doesn't provide this information.

Still another option is a sorted vector + insertion sort. You'll have to write insertion sort yourself, but that is a relatively minor programming task. The reason you want insertion sort is that you will always be sorting a sorted array except for the last element. And insertion sort is optimal for this job.

None of these solutions is perfect, and none but N3586 offer anything approaching a solution "out of the box", i.e. one that doesn't require more than a handful of lines of code. And N3586 doesn't exist. If you think it should exist, contact your C++ national body representative, and tell them so. Or get involved in the C++ committee yourself, and lobby for it.

like image 35
Howard Hinnant Avatar answered Sep 29 '22 11:09

Howard Hinnant