I am running an image analysis code on an array storing information about the image. Unfortunately the code is very heavy and takes an average of 25s to run through a single frame. The main problem I see is the array addressing. Which is the fastest to run through a 2d array and are there at all any differences in
horizontal then vertical
for (int y = 0; y < array.Length; ++y)
for (int x = 0; x < array[].Length; ++x)
//Code using array[y][x]
and vertical then horrizontal?
for (int x = 0; x < array[].Length; ++x)
for (int y = 0; y < array.Length; ++y)
//Code using array[y][x]
Furthermore, I tried to avoid direct addressing and use pointers instead.
for (int y = 0; y < array.Length; ++y)
int* ptrArray = (int*)array[0];
for (int x = 0; x < array[].Length; ++x, ++ptrArray)
//Code using ptrArray for array[y][x]
or
for (int x = 0; x < array[].Length; ++x)
int* ptrArray = (int*)array[0];
for (int y = 0; y < array.Length; ++y, ptrArray += array[].Length)
//Code using ptrArray for array[y][x]
Any help is greatly appreciated. Max
The most important rule is that it's all theory until you profile. I don't hold with those who insist that profiling is everything (without some theory you're no better than a Cargo Cultist putting coconuts on their ears and waiting for the plane to come) but your theory can always be wrong or incomplete, so profiling is crucial.
Generally, we want the inner-scan to be horizontally (in terms of the array, rather than the image, though for most formats that's the same). The reason being that with an array like:
00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
It is going to be laid out as:
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
You want to be scanning along contiguous blocks that can be loaded into CPU caches and then used entirely, rather than scanning from block to block and needing to regularly change the CPU cache contents.
This is even more important if you try to parallelise the algorithm. You want each thread dealing with its own contiguous blocks of memory as far as both input and output goes, rather than not only suffering the way single-threaded code does with poor cache-hit-frequence but also causing each other's buffers to be dirtied and need refreshing. This can be the difference between parallelising leading to a speed boost and parallelising actually slowing things down.
Another thing is the difference between a 2-dimensional array byte[,]
rather than an array of arrays byte[][]
, which your comment in your question "array[y][x]" makes me wonder if perhaps you are using. With the former to obtain arr[1,2] the logic is:
With the latter, the logic is:
There is also less good memory cache-hit-frequence. The latter has benefits when "jagged" structures are needed, but that is not the case here. 2D is almost always faster than array of arrays.
Things I don't see as likely to help, but I would certainly try them in your situation:
You may find a boost from doing your 1d <=> 2d logic. Have a single-dimension array where idx = y * width + x. It shouldn't make an appreciable difference, but it's worth trying.
Optimisations do try to both hoist calls to .Length
and omit needless bounds checking, so you may find manually hoisting and switching to pointer arithmetic doesn't gain anything, but in a case where you really do need to bring time down it is certainly worth profiling.
Finally. Have you profiled how fast your code is at scanning the array and doing nothing? It could be that another part of the code is the real bottleneck, and you're fixing the wrong thing.
One option is to use reverse looping (start your for() loop
from array.Length
down to 0)
That'll speed things up abit.
for example,
for (int x = array[].Length-1; x >= 0; --x)
int* ptrArray = (int*)array[0];
for (int y = array.Length-1; y >= 0 ; --y, ptrArray += array[].Length)
//Code using ptrArray for array[y][x]
I have no idea, but you've already come up with the examples. So you could run your code samples in a loop and profile it yourself.
var sw = new Stopwatch();
sw.Start();
ExecuteMyCode();
sw.Stop();
Console.WriteLine("Time: " + sw.Elapsed);
You might be able to speed up your processing by using a multi-threading construct like Parallel.ForEach
. This would work well if the code in your loop avoids dependencies between loop iterations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With