gprof says that my high computing app spends 53% of its time inside std::vector <...> operator [] (unsigned long)
, 32% of which goes to one heavily used vector. Worse, I suspect that my parallel code failing to scale beyond 3-6 cores is due to a related memory bottleneck. While my app does spend a lot of time accessing and writing memory, it seems like I should be able (or at least try) to do better than 52%. Should I try using dynamic arrays instead (size remains constant in most cases)? Would that be likely to help with possible bottlenecks?
Actually, my preferred solution would be to solve the bottleneck and leave the vectors as is for convenience. Based on the above, are there any likely culprits or solutions (tcmalloc is out)?
Did you examine your memory access pattern itself? It might be inefficient - cache unfriendly.
Did you try to use raw pointer while array accessing?
// regular place
for (int i = 0; i < arr.size(); ++i)
wcout << arr[i];
// In bottleneck
int *pArr = &arr.front();
for (int i = 0; i < arr.size(); ++i)
wcout << pArr[i];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With