Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization of a C++ code (that uses UnorderedMap and Vector)

I am trying to optimize some part of a C++ code that is taking a long time (the following part of the code takes about 19 seconds for X amount of data, and I am trying to finish the whole process in less than 5 seconds for the same amount of data - based on some benchmarks that I have). I have a function "add" that I have written and copied the code here. I will try to explain as much as possible that I think is needed to understand the code. Please let me know if I have missed something.

The following function add is called X times for X amount of data entries.

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data

   // hashTableMap is an unordered_map<int, std::vector<PointObject>>

   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(vector);
            it->second = pointVectorList;
        }
   }
}
like image 749
ParthN Avatar asked Jul 23 '15 20:07

ParthN


3 Answers

You are doing a lot of useless operations... if I understand correctly, a simplified form could be simply:

void HashTable::add(const PointObject& vector) {
   hashTableMap[hash(vector)].push_back(vector);    
}

This works because

  • A map when accessed using operator[] will create a default-initialized value if it's not already present in the map
  • The value (an std::vector) is returned by reference so you can directly push_back the incoming point to it. This std::vector will be either a newly inserted one or a previously existing one if the key was already in the map.

Note also that, depending on the size of PointObject and other factors, it could be possibly more efficient to pass vector by value instead of by const PointObject&. This is the kind of micro optimization that however requires profiling to be performed sensibly.

like image 102
6502 Avatar answered Sep 28 '22 08:09

6502


Instead of calling hashTableMap.count(combinedHash) and hashTableMap.find(combinedHash), better just insert new element and check what insert() returned:

In versions (1) and (2), the function returns a pair object whose first element is an iterator pointing either to the newly inserted element in the container or to the element whose key is equivalent, and a bool value indicating whether the element was successfully inserted or not.

Moreover, do not pass objects by value, where you don't have to. Better pass it by pointer or by reference. This:

std::vector<PointObject> pointVectorList = it->second;

is inefficient since it will create an unnecessary copy of the vector.

like image 37
Adam Stelmaszczyk Avatar answered Sep 28 '22 07:09

Adam Stelmaszczyk


This .count() is totally unecessary, you could simplify your function to:

void HashTable::add(PointObject vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        std::vector<PointObject> pointVectorList = it->second;
        pointVectorList.push_back(vector);
        it->second = pointVectorList;
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

You are also performing copy operations everywhere. Copying an object is time consuming, avoid doing that. Also use references and pointers when possible:

void HashTable::add(PointObject& vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        it->second.push_back(vector);
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

This code can probably be optimized further, but it would require knowing hash(), knowing the way hashTableMap works (by the way, why is it not a std::map?) and some experimentation.

If hashTableMap was a std::map<int, std::vector<pointVectorList>>, you could simplify your function to this:

void HashTable::add(PointObject& vector)
{
    hashTableMap[hash(vector)].push_back(vector);
}

And if it was a std::map<int, std::vector<pointVectorList*>> (pointer) you can even avoid that last copy operation.

like image 34
Havenard Avatar answered Sep 28 '22 08:09

Havenard