I have an unsorted vector of doubles (Actually objects with a double member which is used in this case). From this vector I need to remove the smallest non-unique value. However, it is not guaranteed that a non-unique value exists. It is allowed to sort the range.
As always I started with looking for an std::algorithm and found std::unique. In my first idea I would use this in combination with std::sort to move all non-unique values to the end of the vector and then use min_element over the non-unique values. However, std::unique will leave the non-unique values at the end in an unspecified state. And indeed I lost all non POD members.
Does anyone have a suggestion how to do this efficiently? It is important to do it efficiently since the code is used in the bottleneck of the program (which is already a bit too slow).
Well, if you can sort the range then this is easy. Sort it in ascending order, then just iterate through until you encounter two equivalent, adjacent elements. Done.
Something like this:
T findSmallestNonunique(std::vector<T> v)
{
std::sort(std::begin(v), std::end(v));
auto it = std::adjacent_find(std::begin(v), std::end(v));
if (it == std::end(v))
throw std::runtime_error("No such element found");
return *it;
}
Here's a demonstration:
#include <vector>
#include <algorithm>
#include <stdexcept>
#include <iostream>
template <typename Container>
typename Container::value_type findSmallestNonunique(Container c)
{
std::sort(std::begin(c), std::end(c));
auto it = std::adjacent_find(std::begin(c), std::end(c));
if (it == std::end(c))
throw std::runtime_error("No such element found");
return *it;
}
int main(int argc, char** argv)
{
std::vector<int> v;
for (int i = 1; i < argc; i++)
v.push_back(std::stoi(argv[i]));
std::cout << findSmallestNonunique(v) << std::endl;
}
// g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp \
// && ./a.out 1 2 2 3 4 5 5 6 7 \
// && ./a.out 5 2 8 3 9 3 0 1 4 \
// && ./a.out 5 8 9 2 0 1 3 4 7
//
// 2
// 3
// terminate called after throwing an instance of 'std::runtime_error'
// what(): No such element found
Note that here I'm not performing the search in-place, but I could do by taking the container by reference. (This relies on you being allowed to sort the original input.)
Due to the sort operation, this could be as "bad" as O(N×log(N)), but it's simple and easy to maintain, and does not require any allocations/copies (except for a single copy of the entire dataset which, as stated above, you may trivially completely avoid). You may want to use something else if your input is large or you are expecting to have a failed match in a majority of cases. As always: profile!
You can do this in (expected) linear time.
use an unordered_map
to count the elements. This is (expected) linear in the number of values.
Find the smallest item among non-uniques using a naive loop.
Here is a possible implementation:
#include <unordered_map>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
const vector<double> elems{1, 3.2, 3.2, 2};
unordered_map<double, size_t> c;
for(const double &d: elems)
++c[d];
bool has = false;
double min_;
for(const auto &e: c)
if(e.second > 1)
{
min_ = has? min(e.first, min_): e.first;
has = true;
}
cout << boolalpha << has << " " << min_ << endl;
return 0;
}
Edit As Howard Hinnant & Lightness Races In Orbit have pointed out, this contains both allocations & hashes. It will therefore be linear but with a relatively large factor. Other, sorting-based solutions, might work better for small sizes. When/if profiling, it's important to use a good allocator, e.g., Google's tcmalloc
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With