I need to read a massive amount of data into a buffer (about 20gig). I have 192gb of very fast DDram available, so no issue with memory size. However, I am finding that the following code runs slower and slower the further it gets into the buffer. The Visual C profiler tells me that 68% of the 12 minute execution time is in the 2 statements inside the loop in myFunc(). I am running win7, 64bit on a very fast dell with 2 cpu's, 6 physical cores each (24 logical cores), and all 24 cores are completely maxed out while running this.
#define TREAM_COUNT 9000
#define ARRAY_SIZE ONE_BILLION
#define offSet(a,b,c,d) ( ((size_t) ARRAY_SIZE * (a)) + ((size_t) TREAM_COUNT * 800 * (b)) + ((size_t) 800 * (c)) + (d) )
void myFunc(int dogex, int ptxIndex, int xtreamIndex, int carIndex)
{
short *ptx = (short *) calloc(ARRAY_SIZE * 20, sizeof(short));
#pragma omp parallel for
for (int bIndex = 0; bIndex < 800; ++bIndex)
doWork(dogex, ptxIndex, carIndex);
}
void doWork(int dogex, int ptxIndex, int carIndex)
{
for (int treamIndex = 0; treamIndex < ONE_BILLION; ++treamIndex)
{
short ptxValue = ptx[ offSet(dogex, ptxIndex, treamIndex, carIndex) ];
short lastPtxValue = ptx[ offSet(dogex, ptxIndex-1, treamIndex, carIndex) ];
// ....
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With