Add all elements in a lane

Question

Is there an intrinsic which allows one to add all of the elements in a lane? I am using Neon to multiply 8 numbers together, and I need to sum the result. Here is some paraphrased code to show what I'm currently doing (this could probably be optimised):

int16_t p[8], q[8], r[8];
int32_t sum;
int16x8_t pneon, qneon, result;

p[0] = some_number;
p[1] = some_other_number; 
//etc etc
pneon = vld1q_s16(p);

q[0] = some_other_other_number;
q[1] = some_other_other_other_number;
//etc etc
qneon = vld1q_s16(q);
result = vmulq_s16(p,q);
vst1q_s16(r,result);
sum = ((int32_t) r[0] + (int32_t) r[1] + ... //etc );

Is there a "better" way to do this?

user3249055 · Accepted Answer

If you're targeting the newer arm 64 bit architecture, then ADDV is just the right instruction for you.

Here's how your code will look with it.

qneon = vld1q_s16(q);
result = vmulq_s16(p,q);
sum = vaddvq_s16(result);

That's it. Just one instruction to sum up all of the lanes in the vector register.

Sadly, this instruction doesn't feature in the older 32 bit arm architecture.

Add all elements in a lane

Tags:

c

simd

arm

neon

NOP

1 Answers

user3249055

Recent Activity

Donate For Us

Add all elements in a lane

Tags:

c

simd

arm

neon

NOP

1 Answers

user3249055

Related questions

Recent Activity

Donate For Us