I have the following code which is the bottleneck in one part of my application. All I do is subtract on Array from another. Both of these arrays have more around 100000 elements. I'm trying to find a way to make this more performant.
var
Array1, Array2 : array of integer;
.....
// Code that fills the arrays
.....
for ix := 0 to length(array1)-1
Array1[ix] := Array1[ix] - Array2[ix];
end;
Does anybody have a suggestion?
C = A – B subtracts array B from array A by subtracting corresponding elements. The sizes of A and B must be the same or be compatible. If the sizes of A and B are compatible, then the two arrays implicitly expand to match each other.
subtract() function is used when we want to compute the difference of two array.It returns the difference of arr1 and arr2, element-wise. Parameters : arr1 : [array_like or scalar]1st Input array. arr2 : [array_like or scalar]2nd Input array.
The most straightforward way to subtract two matrices in NumPy is by using the - operator, which is the simplification of the np. subtract() method - NumPy specific method designed for subtracting arrays and other array-like objects such as matrices.
Implement a difference function, which subtracts one list from another and returns the result. It should remove all values from list a , which are present in list b keeping their order. Before reading the solution below, try to solve Array.
Running this on multiple threads, with that big an array will net linear speed-up. It's embarrassingly parallel as they say.
Running subtraction on more threads sounds good, but 100K integer sunstraction don't take a lot of CPU time, so maybe threadpool... However settings threads have also a lot of overhead, so short arrays will have slower productivity in parallel threads than in only one (main) thread!
Did you switch off in compiler settings, overflow and range checking?
You can try to use asm rutine, it is very simple...
Something like:
procedure SubArray(var ar1, ar2; length: integer);
asm
//length must be > than 0!
push ebx
lea ar1, ar1 -4
lea ar2, ar2 -4
@Loop:
mov ebx, [ar2 + length *4]
sub [ar1 + length *4], ebx
dec length
//Here you can put more folloving parts of rutine to more unrole it to speed up.
jz @exit
mov ebx, [ar2 + length *4]
sub [ar1 + length *4], ebx
dec length
//
jnz @Loop
@exit:
pop ebx
end;
begin
SubArray(Array1[0], Array2[0], length(Array1));
It can be much faster...
EDIT: Added procedure with SIMD instructions.
This procedure request SSE CPU support. It can take 4 integers in XMM register and subtract at once. There is also possibility to use movdqa
instead movdqu
it is faster, but you must first to ensure 16 byte aligment. You can also unrole the XMM par like in my first asm case. (I'm interesting about speed measurment. :) )
var
array1, array2: array of integer;
procedure SubQIntArray(var ar1, ar2; length: integer);
asm
//prepare length if not rounded to 4
push ecx
shr length, 2
jz @LengthToSmall
@Loop:
movdqu xmm1, [ar1] //or movdqa but ensure 16b aligment first
movdqu xmm2, [ar2] //or movdqa but ensure 16b aligment first
psubd xmm1, xmm2
movdqu [ar1], xmm1 //or movdqa but ensure 16b aligment first
add ar1, 16
add ar2, 16
dec length
jnz @Loop
@LengthToSmall:
pop ecx
push ebx
and ecx, 3
jz @Exit
mov ebx, [ar2]
sub [ar1], ebx
dec ecx
jz @Exit
mov ebx, [ar2 + 4]
sub [ar1 + 4], ebx
dec ecx
jz @Exit
mov ebx, [ar2 + 8]
sub [ar1 + 8], ebx
@Exit:
pop ebx
end;
begin
//Fill arrays first!
SubQIntArray(Array1[0], Array2[0], length(Array1));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With