I am trying to optimize some part of a C++ code that is taking a long time (the following part of the code takes about 19 seconds for X amount of data, and I am trying to finish the whole process in less than 5 seconds for the same amount of data - based on some benchmarks that I have). I have a function "add" that I have written and copied the code here. I will try to explain as much as possible that I think is needed to understand the code. Please let me know if I have missed something.
The following function add is called X times for X amount of data entries.
void HashTable::add(PointObject vector) // PointObject is a user-defined object
{
int combinedHash = hash(vector); // the function "hash" takes less than 1 second for X amount of data
// hashTableMap is an unordered_map<int, std::vector<PointObject>>
if (hashTableMap.count(combinedHash) == 0)
{
// if the hashmap does not contain the combinedHash key, then
// add the key and a new vector
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
else
{
// otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(vector);
it->second = pointVectorList;
}
}
}
You are doing a lot of useless operations... if I understand correctly, a simplified form could be simply:
void HashTable::add(const PointObject& vector) {
hashTableMap[hash(vector)].push_back(vector);
}
This works because
operator[]
will create a default-initialized value if it's not already present in the mapstd::vector
) is returned by reference so you can directly push_back
the incoming point to it. This std::vector
will be either a newly inserted one or a previously existing one if the key was already in the map.Note also that, depending on the size of PointObject
and other factors, it could be possibly more efficient to pass vector
by value instead of by const PointObject&
. This is the kind of micro optimization that however requires profiling to be performed sensibly.
Instead of calling hashTableMap.count(combinedHash)
and hashTableMap.find(combinedHash)
, better just insert new element and check what insert()
returned:
In versions (1) and (2), the function returns a pair object whose first element is an iterator pointing either to the newly inserted element in the container or to the element whose key is equivalent, and a bool value indicating whether the element was successfully inserted or not.
Moreover, do not pass objects by value, where you don't have to. Better pass it by pointer or by reference. This:
std::vector<PointObject> pointVectorList = it->second;
is inefficient since it will create an unnecessary copy of the vector.
This .count()
is totally unecessary, you could simplify your function to:
void HashTable::add(PointObject vector)
{
int combinedHash = hash(vector);
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(vector);
it->second = pointVectorList;
}
else
{
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
}
You are also performing copy operations everywhere. Copying an object is time consuming, avoid doing that. Also use references and pointers when possible:
void HashTable::add(PointObject& vector)
{
int combinedHash = hash(vector);
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
it->second.push_back(vector);
}
else
{
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
}
This code can probably be optimized further, but it would require knowing hash()
, knowing the way hashTableMap
works (by the way, why is it not a std::map
?) and some experimentation.
If hashTableMap
was a std::map<int, std::vector<pointVectorList>>
, you could simplify your function to this:
void HashTable::add(PointObject& vector)
{
hashTableMap[hash(vector)].push_back(vector);
}
And if it was a std::map<int, std::vector<pointVectorList*>>
(pointer) you can even avoid that last copy operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With