Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would I see ~20% speed increase using native code?

Tags:

Any idea why this code:

extern "C" __declspec(dllexport) void Transform(double x[], double y[], int iterations, bool forward) {     long n, i, i1, j, k, i2, l, l1, l2;     double c1, c2, tx, ty, t1, t2, u1, u2, z;      /* Calculate the number of points */     n = (long)pow((double)2, (double)iterations);      /* Do the bit reversal */     i2 = n >> 1;     j = 0;     for (i = 0; i < n - 1; ++i)     {         if (i < j)         {             tx = x[i];             ty = y[i];             x[i] = x[j];             y[i] = y[j];             x[j] = tx;             y[j] = ty;         }         k = i2;         while (k <= j)         {             j -= k;             k >>= 1;         }         j += k;     }      /* Compute the FFT */     c1 = -1.0;      c2 = 0.0;     l2 = 1;     for (l = 0; l < iterations; ++l)     {         l1 = l2;         l2 <<= 1;         u1 = 1;          u2 = 0;         for (j = 0; j < l1; j++)          {             for (i = j; i < n; i += l2)              {                 i1 = i + l1;                 t1 = u1 * x[i1] - u2 * y[i1];                 t2 = u1 * y[i1] + u2 * x[i1];                 x[i1] = x[i] - t1;                  y[i1] = y[i] - t2;                 x[i] += t1;                 y[i] += t2;             }             z = u1 * c1 - u2 * c2;             u2 = u1 * c2 + u2 * c1;             u1 = z;         }         c2 = sqrt((1.0 - c1) / 2.0);         if (forward)              c2 = -c2;         c1 = sqrt((1.0 + c1) / 2.0);     }      /* Scaling for forward transform */     if (forward)     {         for (i = 0; i < n; ++i)         {             x[i] /= n;             y[i] /= n;         }     } } 

runs 20% faster than this code?

public static void Transform(DataSet data, Direction direction) {     double[] x = data.Real;     double[] y = data.Imag;     data.Direction = direction;     data.ExtremeImag = 0.0;     data.ExtremeReal = 0.0;     data.IndexExtremeImag = 0;     data.IndexExtremeReal = 0;      long n, i, i1, j, k, i2, l, l1, l2;     double c1, c2, tx, ty, t1, t2, u1, u2, z;      /* Calculate the number of points */     n = (long)Math.Pow(2, data.Iterations);      /* Do the bit reversal */     i2 = n >> 1;     j = 0;     for (i = 0; i < n - 1; ++i)     {         if (i < j)         {             tx = x[i];             ty = y[i];             x[i] = x[j];             y[i] = y[j];             x[j] = tx;             y[j] = ty;         }         k = i2;         while (k <= j)         {             j -= k;             k >>= 1;         }         j += k;     }      /* Compute the FFT */     c1 = -1.0;      c2 = 0.0;     l2 = 1;     for (l = 0; l < data.Iterations; ++l)     {         l1 = l2;         l2 <<= 1;         u1 = 1;          u2 = 0;         for (j = 0; j < l1; j++)          {             for (i = j; i < n; i += l2)              {                 i1 = i + l1;                 t1 = u1 * x[i1] - u2 * y[i1];                 t2 = u1 * y[i1] + u2 * x[i1];                 x[i1] = x[i] - t1;                  y[i1] = y[i] - t2;                 x[i] += t1;                 y[i] += t2;             }             z = u1 * c1 - u2 * c2;             u2 = u1 * c2 + u2 * c1;             u1 = z;         }         c2 = Math.Sqrt((1.0 - c1) / 2.0);         if (direction == Direction.Forward)              c2 = -c2;         c1 = Math.Sqrt((1.0 + c1) / 2.0);     }      /* Scaling for forward transform */     if (direction == Direction.Forward)     {         for (i = 0; i < n; ++i)         {             x[i] /= n;             y[i] /= n;             if (Math.Abs(x[i]) > data.ExtremeReal)             {                 data.ExtremeReal = x[i];                 data.IndexExtremeReal = (int)i;             }             if (Math.Abs(y[i]) > data.ExtremeImag)             {                 data.ExtremeImag = y[i];                 data.IndexExtremeImag = (int)i;             }         }     } } 

FFT http://www.rghware.com/fft.png

I created the drop in CPU seen in the middle of the graph by selecting the “Native DLL FFT” in my app:

http://www.rghware.com/InstrumentTuner.zip (source code)

I think this will run on most PCs. You’ll need to have DirectX installed. I had some issues using the capture settings for certain hardware. The capture settings were supposed to be configurable, but the app’s development has been sidetracked by this interesting find.

Anyway, why I’m seeing a 20% increase in speed using the native code? This seems to fly in the face of some of the assumptions I previously had.

UPDATE

After converting the function to an unsafe method and fixing the long/int issue. The new unsafe method actually runs faster than the native method (pretty cool).

Profile http://www.rghware.com/profile.png

It's obvious that the array bound checking is the cause of the 20% slow down in this FFT method. Due to it's nature, the for-loops in this method cannot be optimized.

Thanks everyone for the help.

like image 734
Robert H. Avatar asked May 19 '09 16:05

Robert H.


1 Answers

Just looking at this code, I'd suspect from my experience a fairly significant slowdown going from C++ -> C#.

One major issue you're going to face in a naive port of a routine like this to C# is that C# is going to add bounds checking on every array check here. Since you're never looping through the arrays in a way that will get optimized (see this question for details), just about every array access is going to receive bounds checking.

In addition, this port is pretty close to a 1->1 mapping from C. If you run this through a good .NET profiler, you'll probably find some great spots that can be optimized to get this back to near C++ speed with one or two tweaks (that's nearly always been my experience in porting routines like this).

If you want to get this to be at nearly identical speed, though, you'll probably need to convert this to unsafe code and use pointer manipulation instead of directly setting the arrays. This will eliminate all of the bounds checking issues, and get your speed back.


Edit: I see one more huge difference, which may be the reason your C# unsafe code is running slower.

Check out this page about C# compared to C++, in particular:

"The long type: In C#, the long type is 64 bits, while in C++, it is 32 bits."

You should convert the C# version to use int, not long. In C#, long is a 64bit type. This may actually have a profound effect on your pointer manipulation, because I believe you are inadvertantly adding a long->int conversion (with overflow checking) on every pointer call.

Also, while you're at it, you may want to try wrapping the entire function in an unchecked block. C++ isn't doing the overflow checking you're getting in C#.

like image 185
Reed Copsey Avatar answered Oct 21 '22 08:10

Reed Copsey