Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm to combine std::unique with a reduce step?

Tags:

c++

algorithm

stl

can someone come up with a clean (and fast) solution to the following problem:

  • I have a sequence of entries that hold basically a key and a value, say a
struct Value {
    int index = 0;
    int cost = 0;
}
  • I now want to merge entries such that each key is only contained once but the values should be combined - i.e. each index should be only contained once in the sequence, and the cost for each duplicate index should be accumulated.

The basic solution I came up with sorts the sequence, and when equal entries are detected in the BinaryPredicate passed to std::sort, the cost will be summed into the lhs. Then the cost of rhs will be set to 0. Then follows a remove_if which removes the 0-cost values. See here for an example:

#include <cstdlib>
#include <vector>
#include <algorithm>
#include <iostream>

struct Value
{
    int index = 0;
    int cost = 0;
};

// generate a bunch of random values in a vector
// values will have indices in range [0..10]
std::vector<Value> generator()
{
    std::vector<Value> v(20);
    std::generate(v.begin(), v.end(), []() { return Value{std::rand() % 10, std::rand() % 10}; });
    return v;
}

void print(const std::vector<Value> &values)
{
    for (auto v : values)
        std::cout << "{i=" << v.index << ", c=" << v.cost << "}, ";
    std::cout << "\n";
}

// 
void merge(std::vector<Value> &values)
{
    // sort values and merge costs
    std::sort(values.begin(), values.end(), [](auto &lhs , auto &rhs) {
        if (lhs.index == rhs.index) {
            lhs.cost += rhs.cost;
            rhs.cost = 0;
        }
        return lhs.index < rhs.index;
    });
    // remove entries with empty cost
    auto it = std::remove_if(values.begin(), values.end(), [](const auto &v) { return v.cost == 0; });
    values.erase(it, values.end());
}

int main()
{
    auto v = generator();
    std::cout << "generated values: ";
    print(v);

    merge(v);
    std::cout << "merged values: ";
    print(v);

}

Live on Compiler Explorer

Thing is: While the example above produces the correct results, it is from what I can tell not conforming to the C++ standard. A BinaryPredicate "shall not apply any non-constant function through the dereferenced iterators" http://eel.is/c++draft/algorithms.requirements#8.sentence-4 . Compare is a BinaryPredicate. http://eel.is/c++draft/alg.sorting#general-2.sentence-1 )

Does this mean that my only option is to roll a custom inplace_unique_reduce or similar, or is there maybe an alternative elegant approach to this problem? I would prefer not having to write my own non-trivial algorithm for this.

Thanks

like image 739
milianw Avatar asked Dec 14 '22 07:12

milianw


1 Answers

Assuming you are ok with additional allocations, I would use std::map (or the std::unordered_map):

auto merge_entries(std::vector<Value>& original_values) {
    auto values = std::map<int, int>();

    for (const auto [index, cost] : original_values) {
        values[index] += cost;
    }

    const auto end_of_merged_values = std::transform(
            values.cbegin(), values.cend(), original_values.begin(),
            [](const auto entry) {
                return Value{entry.first, entry.second};
            }
    );

    original_values.erase(end_of_merged_values, original_values.end());
}

Apart from one for() loop (which can be substituted with std::for_each, although such change would introduce unnecessary boilterplate resulting in harder to read code, in my opinion), this solution uses only the STL.

We first merge all the entries using the map and then we overwrite some elements so that our original std::vector holds the merged entries. What's super convenient is the fact that std::transform returns an iterator pointing to the end of the inserted range. Why is it beneficial for us? Because apart from the unlikely scenario where no merging occurs, we have fewer elements compared to what was originally passed in. Using that iterator we can erase the rest of the vector (nonoverwritten elements) keeping it clean, STL-like style.


Assuming you are not ok with additional allocations, but you are ok with streghtening your iterator requirements (to bidirectional), I would use std::partial_sum and std::unique:

template <class BiDirIt, class BinaryPredicateCompare, class BinaryOpReduce>
auto inplace_unique_reduce(
        BiDirIt first, BiDirIt last,
        BinaryPredicateCompare cmp,
        BinaryOpReduce reduce
) {
    std::partial_sum(
            std::make_reverse_iterator(last), std::make_reverse_iterator(first),
            std::make_reverse_iterator(last),
            [cmp, reduce](auto acc, const auto& elem) {
                if (cmp(acc, elem)) {
                    return reduce(acc, elem);
                } else {
                    acc = elem;
                }
                return acc;
            }
    );

    return std::unique(first, last, cmp);
}

used like so:

auto values = std::vector<Value>{
        {1, 1}, {2, 2}, {2, 7}, {0, 5},
        {3, 3}, {1, 2}, {3, 10}
};
auto comparator = [](const auto& lhs, const auto& rhs) {
    return lhs.index == rhs.index;
};
auto reducer = [](const auto& lhs, const auto& rhs) {
    return Value{lhs.index, lhs.cost + rhs.cost};
};

auto to_remove = inplace_unique_reduce(
        values.begin(), values.end(),
        comparator,
        reducer
);

values.erase(to_remove, values.end());

for (const auto[index, cost] : values) {
    std::cout << index << ' ' << cost << '\n';
}

Just like your original answer, this will not merge nonadjacent elements, but to do that you either have to sort them by index or use something like map, from the first part of my answer.

The std::make_reverse_iterator calls are necessary becauase std::partial_sum accumulates the merged element in the most right-hand side one of given group of consecutive, equivalent elements. std::unique, on the other hand, preserves only the first element from such groups. Because of this, you want to merge the elements in the reverse order compared to the one you will be std::unique-ing.


You raised some concerns about situations where copying or moving is expensive - in such cases, you are either left with your custom solutions that take into considerations your unique constraints, or you ease your constraints. Here we move-assign merged entries, but that's it for the potential bottlenecks. If your move assignment operator is expensive, I fear that no standard solution will work for you and you have to roll your own, like in your answer.

like image 68
Fureeish Avatar answered Dec 31 '22 12:12

Fureeish