Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Array addressing

I am running an image analysis code on an array storing information about the image. Unfortunately the code is very heavy and takes an average of 25s to run through a single frame. The main problem I see is the array addressing. Which is the fastest to run through a 2d array and are there at all any differences in

horizontal then vertical

for (int y = 0; y < array.Length; ++y)
    for (int x = 0; x < array[].Length; ++x)
        //Code using array[y][x]

and vertical then horrizontal?

for (int x = 0; x < array[].Length; ++x)
    for (int y = 0; y < array.Length; ++y)
        //Code using array[y][x]

Furthermore, I tried to avoid direct addressing and use pointers instead.

for (int y = 0; y < array.Length; ++y)
    int* ptrArray = (int*)array[0];
    for (int x = 0; x < array[].Length; ++x, ++ptrArray)
        //Code using ptrArray for array[y][x]

or

for (int x = 0; x < array[].Length; ++x)
    int* ptrArray = (int*)array[0];
    for (int y = 0; y < array.Length; ++y, ptrArray += array[].Length)
        //Code using ptrArray for array[y][x]

Any help is greatly appreciated. Max

like image 396
Max Z. Avatar asked Dec 13 '11 10:12

Max Z.


3 Answers

The most important rule is that it's all theory until you profile. I don't hold with those who insist that profiling is everything (without some theory you're no better than a Cargo Cultist putting coconuts on their ears and waiting for the plane to come) but your theory can always be wrong or incomplete, so profiling is crucial.

Generally, we want the inner-scan to be horizontally (in terms of the array, rather than the image, though for most formats that's the same). The reason being that with an array like:

00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29

It is going to be laid out as:

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

You want to be scanning along contiguous blocks that can be loaded into CPU caches and then used entirely, rather than scanning from block to block and needing to regularly change the CPU cache contents.

This is even more important if you try to parallelise the algorithm. You want each thread dealing with its own contiguous blocks of memory as far as both input and output goes, rather than not only suffering the way single-threaded code does with poor cache-hit-frequence but also causing each other's buffers to be dirtied and need refreshing. This can be the difference between parallelising leading to a speed boost and parallelising actually slowing things down.

Another thing is the difference between a 2-dimensional array byte[,] rather than an array of arrays byte[][], which your comment in your question "array[y][x]" makes me wonder if perhaps you are using. With the former to obtain arr[1,2] the logic is:

  1. Check Bounds
  2. Calculate position (simple fast arithmetic)
  3. Retrieve value.

With the latter, the logic is:

  1. Check bounds
  2. Obtain array through pointer.
  3. Check bounds
  4. Retrieve value.

There is also less good memory cache-hit-frequence. The latter has benefits when "jagged" structures are needed, but that is not the case here. 2D is almost always faster than array of arrays.

Things I don't see as likely to help, but I would certainly try them in your situation:

You may find a boost from doing your 1d <=> 2d logic. Have a single-dimension array where idx = y * width + x. It shouldn't make an appreciable difference, but it's worth trying.

Optimisations do try to both hoist calls to .Length and omit needless bounds checking, so you may find manually hoisting and switching to pointer arithmetic doesn't gain anything, but in a case where you really do need to bring time down it is certainly worth profiling.

Finally. Have you profiled how fast your code is at scanning the array and doing nothing? It could be that another part of the code is the real bottleneck, and you're fixing the wrong thing.

like image 151
Jon Hanna Avatar answered Sep 18 '22 06:09

Jon Hanna


One option is to use reverse looping (start your for() loop from array.Length down to 0)

That'll speed things up abit.

for example,

for (int x = array[].Length-1; x >= 0; --x)
    int* ptrArray = (int*)array[0];
    for (int y = array.Length-1; y >= 0 ; --y, ptrArray += array[].Length)
        //Code using ptrArray for array[y][x]
like image 34
Shai Avatar answered Sep 21 '22 06:09

Shai


I have no idea, but you've already come up with the examples. So you could run your code samples in a loop and profile it yourself.

var sw = new Stopwatch();
sw.Start();
ExecuteMyCode();
sw.Stop();
Console.WriteLine("Time: " + sw.Elapsed);

You might be able to speed up your processing by using a multi-threading construct like Parallel.ForEach. This would work well if the code in your loop avoids dependencies between loop iterations.

like image 23
Merlyn Morgan-Graham Avatar answered Sep 18 '22 06:09

Merlyn Morgan-Graham