Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add all elements in a lane

Tags:

c

simd

arm

neon

Is there an intrinsic which allows one to add all of the elements in a lane? I am using Neon to multiply 8 numbers together, and I need to sum the result. Here is some paraphrased code to show what I'm currently doing (this could probably be optimised):

int16_t p[8], q[8], r[8];
int32_t sum;
int16x8_t pneon, qneon, result;

p[0] = some_number;
p[1] = some_other_number; 
//etc etc
pneon = vld1q_s16(p);

q[0] = some_other_other_number;
q[1] = some_other_other_other_number;
//etc etc
qneon = vld1q_s16(q);
result = vmulq_s16(p,q);
vst1q_s16(r,result);
sum = ((int32_t) r[0] + (int32_t) r[1] + ... //etc );

Is there a "better" way to do this?

like image 621
NOP Avatar asked Aug 29 '12 04:08

NOP


1 Answers

If you're targeting the newer arm 64 bit architecture, then ADDV is just the right instruction for you.

Here's how your code will look with it.

qneon = vld1q_s16(q);
result = vmulq_s16(p,q);
sum = vaddvq_s16(result);

That's it. Just one instruction to sum up all of the lanes in the vector register.

Sadly, this instruction doesn't feature in the older 32 bit arm architecture.

like image 51
user3249055 Avatar answered Oct 04 '22 11:10

user3249055