Suppose I have two integer arrays a and b with 10 ints per array. Is there a way I can add the contents of b[i] to a[i] using some "memset" or "memcopy" trick? I'm just looking for something faster than the obvious for loop w/ a[i] += b[i] etc.
"Silly" - I think it's an excellent question!
You say "adding" not "copying" and I'm assuming x86:
void addintvector (int *dstp, const int *srcp, int nints)
{
int *endp;
endp=dst+nints;
nints=srcp-dstp; // reuse nints
while (dstp!=endp)
{
*dstp+=*(dstp+nints); // makes use of the [base+index*4] x86 addressing
dstp+=1; // some prefer ++dstp but I don't when it comes to pointers
}
}
The loop should translate into
add_label:
mov eax,[ebx+esi*4]
add [ebx],eax
add ebx,4
cmp ebx,edx
jne add_label
That's five instructions per loop: it won't get much faster than that!
It's also easy to clone into subtract, divide and multiply variants.
Some speak of using a GPU but this requires that 1. the GPU interfaces with applications and 2. your array is large enough to overcome the associated overhead.
To overcome the call/return overhead you could experiment with declaring it inline.
Edit
I just read your comment "since it's for a game on a mobile device" and I guess it's not an x86 platform and therefore probably does not have a reg+reg*scale addressing mode. If that is the case the code should be written
void addintvector (int *dstp, const int *srcp, int nints)
{
int *endp;
endp=dst+nints;
while (dstp!=endp)
{
*dstp+=*srcp;
srcp+=1;
dstp+=1;
}
}
Not knowing which architecture you're targeting but assuming RISC I guess the code will be eight instructions long instead (in "unoptimized" psuedocode):
add_label:
mov tempreg1,[srcreg]
mov tempreg2,[dstreg]
add tempreg2,tempreg1
mov [dstreg],tempreg2
add srcreg,4
add dstreg,4
cmp dstreg,endreg
jne add_label
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With