Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid heap allocation when inserting into sorted vector<unique_ptr<pair<>>>

Tags:

c++

c++11

I have some data stored in: std::vector<std::unique_ptr<std::pair<Key, Data>>> where Data is some large sized object, and Key uniquely identifies Data. We can assume there are no duplicates on Key and this vector is sorted ascending based on Key.

I have implemented insert in following way (following standard containers)

bool compare(const std::unique_ptr<std::pair<Key,Data>>& elem,
             const std::unique_ptr<std::pair<Key,Data>>& input)
{
  return elem->first < input->first;
}

typedef std::vector<std::unique_ptr<std::pair<Key, Data>>> DataStore;

std::pair<DataStore::iterator, bool>
insert(DataStore& vec, const Key& k, const Data& d)
{
  using namespace std::placeholders; // for using bind

  // vvv-- This heap allocation seems unnecessary when element already exists.
  // seems mainly needed for lower_bound to work
  std::unique_ptr<std::pair<Key,Data>> valPtr(new std::pair<Key,Data>(k,d));

  auto location = std::lower_bound(std::begin(vec),
                                   std::end(vec), valPtr,
                                   std::bind(&compare, _1, _2));
  // exists, return element location 
  if(location != vec.end() && (*location)->first == k) {
    return std::make_pair(location, false);
  }

  // non-existing element, add it to the right location
  auto addedLocation = vec.emplace(location, std::move(valPtr));
  return std::make_pair(addedLocation, true);
}

Could someone suggest ways to avoid allocation in insert in the comment location above?

I would prefer not to write my own implementation of lower_bound/binary_search.

like image 206
Chintan Avatar asked Apr 09 '26 04:04

Chintan


2 Answers

std::lower_bound doesn't require any relationship between the type T of the object that we are searching for and the type of the elements, other than the fact that we have to be able to compare them with cmp(element, value)1. We can make use of that fact:

std::pair<DataStore::iterator, bool>
insert(DataStore& vec, const Key& k, const Data& d)
{
  // Note: since you are using C++11, you can use lambdas rather than `std::bind`.
  // This simplifies your code and makes it easier for the compiler to optimize
  auto location = std::lower_bound(std::begin(vec), std::end(vec), k,
    [](const std::unique_ptr<std::pair<Key,Data>>& elem, const Key& key) {
      return elem->first < key;
    });

  // exists, return element location 
  if(location != vec.end() && (*location)->first == k) {
    return std::make_pair(location, false);
  }

  // non-existing element, add it to the right location
  auto addedLocation = vec.emplace(location, new std::pair<Key,Data>(k,d));
  return std::make_pair(addedLocation, true);
}

1: A lot of the standard algorithms do this. They are designed to be as general as possible, so they place as few requirements on the types passed in as is reasonable.

like image 131
Justin Avatar answered Apr 11 '26 17:04

Justin


Luckily, std::lower_bound and similar functions don't actually require the functor to take two arguments of the same type.

lower_bound expects a functor such that the dereferenced iterator can be used as the first argument and the object to search for can be used as the second argument:

auto location = std::lower_bound(
    v.begin(), v.end(),
    std::make_pair(k,d),
    [](const std::unique_ptr<std::pair<Key, Data>>& ptr,
       const std::pair<Key, Data>& val)
    { return ptr->first < val.first; } );
like image 32
aschepler Avatar answered Apr 11 '26 18:04

aschepler