Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to find frequencies of each unique value in the std::vector

Given a vector std::vector<double> v, we can find unique elements efficiently by:

std::vector<double> uv(v.begin(), v.end());
std::sort(uv.begin(), uv.end());
std::erase(std::unique(uv.begin, uv.end()), uv.end());

What would the be the nicest way (without loops, with STL or lambdas) to create a vector:

std::vector<double> freq_uv(uv.size());

which would contain frequencies of each distinct element appearing in v (order the same as sorted unique values)?

Note: type can be anything, not just double

like image 966
Oleg Shirokikh Avatar asked Dec 26 '22 15:12

Oleg Shirokikh


2 Answers

After you sort, before you erase:

std::vector<int> freq_uv;
freq_uv.push_back(0);
auto prev = uv[0];        // you should ensure !uv.empty() if previous code did not already ensure it.
for (auto const & x : uv)
{
    if (prev != x)
    {
        freq_uv.push_back(0);
        prev = x;
    }
    ++freq_uv.back();
}

Note that, while I generally like to count occurences with a map, as Yakk is doing, in this case I think it is doing a lot of unnecessary work as we already know the vector is sorted.

Another possibility is to use a std::map (not unordered), instead of sorting. This will get your frequencies first. Then, since the map is ordered, you can just create the sorted, unique vector, and the frequency vector directly from the map.

// uv not yet created
std::map<T, int> freq_map;
for (auto const & x : v)
    ++freq_map[x];
std::vector<T> uv;
std::vector<int> freq_uv;
for (auto const & p : freq_map)
{
    uv.push_back(p.first);
    freq_uv.push_back(p.second);
}
like image 163
Benjamin Lindley Avatar answered Dec 28 '22 05:12

Benjamin Lindley


First, note that == and to a lesser extent < on double is often a poor idea: often you'll have values that logically "should" be equal if the double was infinite precision, but are slightly different.

However, collecting the frequencies is easy:

template<typename T, typename Allocator>
std::unordered_map< T, std::size_t > frequencies( std::vector<T, Allocator> const& src ) {
  std::unordered_map< T, std::size_t > retval;
  for (auto&& x:src)
    ++retval[x];
  return retval;
}

assuming std::hash<T> is defined (which it is for double). If not, there is more boilerplate, so I'll skip it. Note that this does not care if the vector is sorted.

If you want it in the form of std::vector<std::size_t> in sync with your sorted vector, you can just do this:

template<typename T, typename Hash, typename Equality, typename Allocator>
std::vector<std::size_t> collate_frequencies(
  std::vector<T, Allocator> const& order,
  std::unordered_map<T, std::size_t, Hash, Equality> const& frequencies
) {
  std::vector<std::size_t> retval;
  retval.reserve(order.size());
  for( auto&& x : order )
    retval.push_back( frequencies[x] );
  return retval;
}

I took the liberty of making these functions overly generic, so they support more than just doubles.

like image 24
Yakk - Adam Nevraumont Avatar answered Dec 28 '22 04:12

Yakk - Adam Nevraumont