Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advantages and disadvantages of using a concatenated key rather than nested Map Container in C++

I have been given a project to investigate alternatives to the current container that is used to store Data in order to make it more efficient.

The current design involved 4 nested maps like that map< string, map< string, map< int, map< string, string> > > >

Lets Name each data field as Company, Department, ID_of_employee, Name

The time complexity at the moment to retrieve the the name of an employee given Company, Dept, ID is O(log N) and more precisely it involves three lookups.

Space complexity for now is not an issue.

My initial alternatives were the followings:

  • Use nested pairs to represent Company, Dept, Id and then use this nested pairs as key to a map. This seems not easily readable though.
  • Instead of nested pairs i thought about using a tuple or a struct which essentially as I read are not that different. After creating the new struct EmployeeKey that will contain fields for Company, Dept, ID. I can use it as a Key to a map. (i guess i have to write custom compare and less than operators also).
  • Use a concatenated key from company+Dept+ID by converting the int to string and concatenating them. Then feed this key to a map<ConcatenatedKey, Data>
  • Use Boost.MultiIndex. Even though this seems the best option I abandoned this option cause I found it a bit complicated to.

To give a little more info that is necessary. This Container is usually used to retrieve the final nested data and that is why I concluded to use the concatenated key approach. My question basically is, are there any caveats in using this sort of concatenated string? Is it a bad design or something we should avoid?

To my understanding this will improve the lookup time, still keep it logarithmic but perform one rather than four lookups thus it seems like an improvement.

like image 375
Nick Sorros Avatar asked Dec 19 '22 16:12

Nick Sorros


1 Answers

Since std::map<> is a Red-black tree, which is still a binary tree, the lookup speed is not that fast compared to hashmaps - especially if the number of entries is large.

Using an std::unordered_map<> (a hashmap) will give better performance, assuming the hash spread is good. I recommend fnv or MurmurHash3, since they have the nicest distribution of values.

Now, talking about nested containers - you should never, ever do such a thing! The overall performance can be quite horrible and the memory usage will definitely be very large, since it's essentially a 4-Dimensional RB-tree:

Lets put this into context, where you have 20 Companies, each company has 5 departments, each department has 12 EmployeeID's and each EmployeeID maps to a map of <Name, some_string> (the last bit seems a bit redundant, don't you think?).

  1. each Company leaf node is an std::map => 20 std::map instances
  2. each Department leaf node is an std::map => 20 + 20*5 = 120 std::map instances
  3. each EmployeeID leaf node is an std::map => 120 + 20*5*12 = 1320 std::map instances
  4. each Name leaf node is an std::map => 1320 + 20*5*12*1 = 2520 std::map instances

So you see, nesting containers is very dangerous, since even with a small data set you can end up with a huge amount of container instances. This has ridiculously bad performance, especially when the objects are being destroyed or new elements are inserted.

What I recommend: Use an EmployeeKey struct combined with std::unordered_map. This will give you a good lookup speed and only one std::unordered_map instance.

struct EmployeeKey
{
    int         CompanyID;  // if you want speed, CompanyID is the way to go
    std::string Department;
    int         EmployeeID;
    std::string Name;

    inline bool operator==(const EmployeeKey& key) const {
        return CompanyID != key.CompanyID && ... /* etc */;
    }
};

template<> struct hash<EmployeeKey> {
    size_t operator()(const EmployeeKey& key) const {
        /* perform hash combine here */
    }
};

And this should be enough to get you started. The final layout would look like this:

std::unordered_map<EmployeeKey, std::string> EmployeeData;
// usage:
auto it = EmployeeData.find(selectedEmployee);
if (it != EmployeeData.end())
    it->second = "Good employee";

If you really have to 'speed up' your lookups by every possible means, you can keep in mind that if CompanyID's are integers from [0 .. N], you can use an std::vector to get a fast first level index to right company:

std::vector<std::unordered_map<EmployeeKey2, std::string>> EmployeeData;
// usage:
auto& companyMap = EmployeeData[EBuyNLargeCorp]; // or [selectedCompany(0..N)]
auto it = companyMap.find(selectedEmployee);
if (it != companyMap.end())
    it->second = "Good employee!";

Where EmployeeKey2 would be missing the CompanyID field and selectedCompany would be the Index into the vector instead. But this is only something you do for really crucial performance gains.

like image 130
Jorma Rebane Avatar answered Feb 16 '23 02:02

Jorma Rebane