Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ : alternative for Vector of references to avoid copying large data

I have spent some time looking for answers but didn't find anything that was satisfactory.

Just interested in how some more seasoned C++ people solve this kind of problem as now I am doing a little more production related coding than prototyping.

Let say you have a class that has say a unordered_map (hashmap) that holds a lot of data, say 500Mb. You want to write an accessor that returns some subset of that data in an efficient manner.

Take the following, where BigData is some class that stores a moderate amount of data.

Class A
{
   private:
      unordered_map<string, BigData> m_map;   // lots of data

   public:

    vector<BigData>   get10BestItems()
    {
        vector<BigData>  results;
        for ( ........  // iterate over m_map and add 10 best items to results
        // ... 
       return results;
    }

};

The accessor get10BestItems is not very efficient in this code because it first copies the items to the results vector, then the results vector is copied when the function is returned (copying from the function stack).

You can't have a vector of references in c__ for various reasons, which would be the obvious answer:

vector<BigData&> results;     // vector can't contain references.

You could create the results vector on the heap and pass a reference to that:

vector<BigData>&   get10BestItems()    // returns a reference to the vector
    {
        vector<BigData>  results = new vector<BigData>;   // generate on heap
        for ( ........  // iterate over m_map and add 10 best items to results
            // ... 
       return results;   // can return the reference 
    } 

But then you are going to run into memory leak issues if you are not careful. It is also slow (heap memory) and still copies data from the map to the vector.

So we can look back at c-style coding and just use pointers:

vector<BigData*>   get10BestItems()    // returns a vector of pointers
    {
        vector<BigData*>  results ; // vectors of pointers
        for ( ........  // iterate over m_map and add 10 best items to results
        // ... 
       return results;  
    } 

But most sources say to not use pointers unless absolutely necessary. There are options to use smart_pointers and the boost ptr_vector but I rather try to avoid these if possible.

I do no that the map is going to be static so I am not too worried about bad pointers. Just one issue if the code will have to be difference to handle pointers. Stylistically this is not pleasant:

const BigData&   getTheBestItem()    // returns a const reference
{
       string bestID;
       for ( ........  // iterate over m_map, find bestID
       // ... 
       return m_map[bestID] ; // return a referencr to the best item
}


vector<BigData*>   get10BestItems()    // returns a vector of pointers
{    
        vector<BigData*>  results ; // vectors of pointers
        for_each ........  // iterate over m_map and add 10 best items to results
        // ... 
       return results;  
 };

E.g., if you want a single item then it is easy to return a reference.

Finally option is to simply make the Hash-map public and return a vector of keys (in this case strings):

Class A
{
      public:

         unordered_map<string, BigData> m_map;   // lots of data



    vector<string>   get10BestItemKeys()
    {
        vector<string>  results;
        for (........  // iterate over m_map and add 10 best KEYS to results
        // ... 
       return results;
    }

};



A aTest;
... // load data to map

vector <string> best10 =  aTest.get10BestItemKeys();
for ( .... // iterate over all KEYs in best10
{
    aTest.m_map.find(KEY);  // do something with item.
    // ...
} 

What is the best solution? Speed is important but I want ease of development and safe programming practices.

like image 527
user1978816 Avatar asked Dec 12 '25 05:12

user1978816


1 Answers

I would just go with a vector of pointers if the map is constant. You can always return const pointers if you want to avoid the data being changed.

References are great for when they work but there's a reason we still have pointers (for me this would fall under the category of being 'necessary').

like image 135
Tim MB Avatar answered Dec 14 '25 19:12

Tim MB



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!