I am writing a code for a mathematical method (Incomplete Cholesky) and I have hit a curious roadblock. Please see the following simplified code.
for(k=0;k<nosUnknowns;k++)
{
//Pieces of code
for(i=k+1;i<nosUnknowns;i++)
{
// more code
}
for(j=k+1;j<nosUnknowns;j++)
{
for(i=j;i<nosUnknowns;i++)
{
//Some more code
if(xOk && yOk && zOk)
{
if(xDF == 1 && yDF == 0 && zDF == 0)
{
for(row=0;row<3;row++)
{
for(col=0;col<3;col++)
{
// All 3x3 static arrays This is the line
statObj->A1_[row][col] -= localFuncArr[row][col];
}
}
}
}
}//Inner loop i ends here
}//Inner loop j ends here
}//outer loop k ends here
For context,
statObj is an object containing a number of 3x3 static double arrays. I am initializing statObj by a call to new function. Then I am populating the arrays inside it using some mathematical functions. One such array is A1_. The value of variable nosUnknowns is around 3000. The array localFuncArr is previously generated by matrix multiplication and is a double array.
Now this is my problem:
When I use the line as shown in the code, the code runs extremely sluggishly. Something like 245secs for the whole function.
When I comment out the said line, the code performs extremely fast. It takes something like 6 secs.
Now when I replace the said line with the following line : localFuncArr[row][col] += 3.0
, again the code runs with the same speed as that of case(2) above.
Clearly something about the call to statObj->A1_
is making the code run slow.
My question(s):
Is Cache Poisoning the reason why this is happening ?
If so, what could be changed in terms of array initialization/object initialization/loop unrolling or for that matter any form of code optimization that can speed this up ?
Any insights to this from experienced folks is highly appreciated.
EDIT: Changed the description to be more verbose and redress some of the points mentioned in the comments.
If the conditions are mostly true, your line of code is executed 3000x3000x3000x3x3 times. That's about 245 billion times. Depending on your hardware architecture 245 seconds might be a very reasonable timing (that's 1 iteration every 2 cycles - assuming 2GHz processor). In any case there isn't anything in the code that suggests cache poisoning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With