I have a very large nested for loop in which some multiplications and additions are performed on floating point numbers.
for (int i = 0; i < length1; i++)
{
double aa = 0;
for(int h = 0; h < 10; h++)
{
aa += omega[i][outsideGeneratedAddress[h]];
}
double alphaOld = alpha;
alpha = Math.Sqrt(alpha * alpha + aa * aa);
s = -aa / alpha;
c = alphaOld / alpha;
for(int j = 0; j <= i; j++)
{
double oldU = u[j];
u[j] = c * oldU + s * omega[i][j];
omega[i][j] = c * omega[i][j] - s * oldU;
}
}
This loop is taking up the majority of my processing time and is a bottleneck.
Would I be likely to see any speed improvements if I rewrite this loop in C and interface to it from C#?
EDIT: I updated the code to show how s and c are generated. Also the inner loop actually goes from 0 to i, though it probably doesn't make much difference to the question
EDIT2: I implemented the algorithm in VC++ and linked it with C# through a dll and saw a 28% speed boost over C# when all optimisations are enabled. The argument to enable SSE2 works particularly well. Compiling with MinGW and gcc4.4 only gave a 15% speed boost. Just tried the Intel compiler and saw a 49% speed boost for this code.
Performance: C++ is widely used when higher level languages are not efficient. C++ code is much faster than C# code, which makes it a better solution for applications where performance is important.
Performance-based on Nature Of Language C++ language is an object-oriented programming language, and it supports some important features like Polymorphism, Abstract Data Types, Encapsulation, etc. Since it supports object-orientation, speed is faster compared to the C language.
Updated:
What happens if you write inner loop to take account of locality of reference:
for (int i = 0; i < length1; i++)
{
s = GetS(i);
c = GetC(i);
double[] omegaTemp = omega[i];
for(int j = 0; j < length2; j++)
{
double oldU = u[j];
u[j] = c * oldU + s * omegaTemp[j];
omegaTemp[j] = c * omegaTemp[j] - s * oldU;
}
}
Use an unsafe
block and pointers to index into your omega
array. This will remove the overhead of range checking and may be a significant win if you do enough accesses. A lot of time may also be being spent in your GetS()
and GetC()
functions, which you didn't provide source for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With