Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize copying chunks of an array in C#?

I am writing a live-video imaging application and need to speed up this method. It's currently taking about 10ms to execute and I'd like to get it down to 2-3ms.

I've tried both Array.Copy and Buffer.BlockCopy and they both take ~30ms which is 3x longer than the manual copy.

One thought was to somehow copy 4 bytes as an integer and then paste them as an integer, thereby reducing 4 lines of code to one line of code. However, I'm not sure how to do that.

Another thought was to somehow use pointers and unsafe code to do this, but I'm not sure how to do that either.

All help is much appreciated. Thank you!

EDIT: Array sizes are: inputBuffer[327680], lookupTable[16384], outputBuffer[1310720]

public byte[] ApplyLookupTableToBuffer(byte[] lookupTable, ushort[] inputBuffer)
{
    System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
    sw.Start();

    // Precalculate and initialize the variables
    int lookupTableLength = lookupTable.Length;
    int bufferLength = inputBuffer.Length;
    byte[] outputBuffer = new byte[bufferLength * 4];
    int outIndex = 0;
    int curPixelValue = 0;

    // For each pixel in the input buffer...
    for (int curPixel = 0; curPixel < bufferLength; curPixel++)
    {
        outIndex = curPixel * 4;                    // Calculate the corresponding index in the output buffer
        curPixelValue = inputBuffer[curPixel] * 4;  // Retrieve the pixel value and multiply by 4 since the lookup table has 4 values (blue/green/red/alpha) for each pixel value

        // If the multiplied pixel value falls within the lookup table...
        if ((curPixelValue + 3) < lookupTableLength)
        {
            // Copy the lookup table value associated with the value of the current input buffer location to the output buffer
            outputBuffer[outIndex + 0] = lookupTable[curPixelValue + 0];
            outputBuffer[outIndex + 1] = lookupTable[curPixelValue + 1];
            outputBuffer[outIndex + 2] = lookupTable[curPixelValue + 2];
            outputBuffer[outIndex + 3] = lookupTable[curPixelValue + 3];

            //System.Buffer.BlockCopy(lookupTable, curPixelValue, outputBuffer, outIndex, 4);   // Takes 2-10x longer than just copying the values manually
            //Array.Copy(lookupTable, curPixelValue, outputBuffer, outIndex, 4);                // Takes 2-10x longer than just copying the values manually
        }
    }

    Debug.WriteLine("ApplyLookupTableToBuffer(ms): " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
    return outputBuffer;
}

EDIT: I've updated the method keeping the same variable names so others can see how the code would translate based on HABJAN's solution below.

    public byte[] ApplyLookupTableToBufferV2(byte[] lookupTable, ushort[] inputBuffer)
    {
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
        sw.Start();

        // Precalculate and initialize the variables
        int lookupTableLength = lookupTable.Length;
        int bufferLength = inputBuffer.Length;
        byte[] outputBuffer = new byte[bufferLength * 4];
        //int outIndex = 0;
        int curPixelValue = 0;

        unsafe
        {
            fixed (byte* pointerToOutputBuffer = &outputBuffer[0])
            fixed (byte* pointerToLookupTable = &lookupTable[0])
            {
                // Cast to integer pointers since groups of 4 bytes get copied at once
                uint* lookupTablePointer = (uint*)pointerToLookupTable;
                uint* outputBufferPointer = (uint*)pointerToOutputBuffer;

                // For each pixel in the input buffer...
                for (int curPixel = 0; curPixel < bufferLength; curPixel++)
                {
                    // No need to multiply by 4 on the following 2 lines since the pointers are for integers, not bytes
                    // outIndex = curPixel;  // This line is commented since we can use curPixel instead of outIndex
                    curPixelValue = inputBuffer[curPixel];  // Retrieve the pixel value 

                    if ((curPixelValue + 3) < lookupTableLength)
                    {
                        outputBufferPointer[curPixel] = lookupTablePointer[curPixelValue];
                    }
                }
            }
        }

        Debug.WriteLine("2 ApplyLookupTableToBuffer(ms): " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
        return outputBuffer;
    }
like image 367
nb1forxp Avatar asked Jan 13 '14 17:01

nb1forxp


1 Answers

I did some tests, and I managed to achieve max speed by turning my code into unsafe along with using the RtlMoveMemory API. I figured out that Buffer.BlockCopy and Array.Copy were much slower than direct RtlMoveMemory usage.

So, at the end you will end up with something like this:

fixed(byte* ptrOutput= &outputBufferBuffer[0])
{
    MoveMemory(ptrOutput, ptrInput, 4);
}

[DllImport("Kernel32.dll", EntryPoint = "RtlMoveMemory", SetLastError = false)]
private static unsafe extern void MoveMemory(void* dest, void* src, int size);

EDIT:

Ok, now once when I figured out your logic and when I did some tests, I managed to speed up your method for almost up to 50%. Since you need to copy a small data blocks (always 4 bytes), yes, you were right, RtlMoveMemory wont help here and it's better to copy data as integer. Here is the final solution I came up with:

public static byte[] ApplyLookupTableToBufferV2(byte[] lookupTable, ushort[] inputBuffer)
{
    int lookupTableLength = lookupTable.Length;
    int bufferLength = inputBuffer.Length;
    byte[] outputBuffer = new byte[bufferLength * 4];
    int outIndex = 0, curPixelValue = 0;

    unsafe
    {
        fixed (byte* ptrOutput = &outputBuffer[0])
        fixed (byte* ptrLookup = &lookupTable[0])
        {
            uint* lkp = (uint*)ptrLookup;
            uint* opt = (uint*)ptrOutput;

            for (int index = 0; index < bufferLength; index++)
            {
                outIndex = index;
                curPixelValue = inputBuffer[index];

                if ((curPixelValue + 3) < lookupTableLength)
                {
                    opt[outIndex] = lkp[curPixelValue];
                }
            }
        }
    }

    return outputBuffer;
}

I renamed your method to ApplyLookupTableToBufferV1.

And here are my test result:

int tc1 = Environment.TickCount;

for (int i = 0; i < 200; i++)
{
    byte[] a = ApplyLookupTableToBufferV1(lt, ib);
}

tc1 = Environment.TickCount - tc1;

Console.WriteLine("V1: " + tc1.ToString() + "ms");

Result - V1: 998 ms

int tc2 = Environment.TickCount;

for (int i = 0; i < 200; i++)
{
    byte[] a = ApplyLookupTableToBufferV2(lt, ib);
}

tc2 = Environment.TickCount - tc2;

Console.WriteLine("V2: " + tc2.ToString() + "ms");

Result - V2: 473 ms

like image 200
HABJAN Avatar answered Sep 22 '22 21:09

HABJAN