I will be parsing 60GB of text and doing a lot of insert and lookups in maps. I just started using boost::unordered_set and boost::unordered_map As my program starts filling in these containers they start growing bigger and bigger and i was wondering if this would be a good idea to pre allocate memory for these containers. something like mymap::get_allocator().allocate(N); ?
or should i just leave them to allocate and figure out grow factors by themselves? the codes look like this
boost::unordered_map <string,long> words_vs_frequency, wordpair_vs_frequency;
boost::unordered_map <string,float> word_vs_probability, wordpair_vs_probability,
wordpair_vs_MI;
//... ... ...
N = words_vs_frequency.size();
long y =0; float MIWij =0.0f, maxMI=-999999.0f;
for (boost::unordered_map <string,long>::iterator i=wordpair_vs_frequency.begin();
i!=wordpair_vs_frequency.end(); ++i){
if (i->second >= BIGRAM_OCCURANCE_THRESHOLD)
{
y++;
Wij = i->first;
WordPairToWords(Wij, Wi,Wj);
MIWij = log ( wordpair_vs_probability[Wij] /
(word_vs_probability[Wi] * word_vs_probability[Wj])
);
// keeping only the pairs which MI value greater than
if (MIWij > MUTUAL_INFORMATION_THRESHOLD)
wordpair_vs_MI[ Wij ] = MIWij;
if(MIWij > maxMI )
maxMI = MIWij;
}
}
Thanks in advance
According to the documentation, both unordered_set
and unordered_map
have a method
void rehash(size_type n);
that regenerates the hashtable so that it contains at least n
buckets. (It sounds like it does what reserve()
does for STL containers).
I would try it both ways, which will let you generate hard data showing whether one method works better than the other. We can speculate all day about which method will be optimal, but as with most performance questions, the best thing to do is try it out and see what happens (and then fix the parts that actually need fixing).
That being said, the Boost authors seem to be very smart, so it quite possibly will work fine as-is. You'll just have to test and see.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With