Mixing NEON assembly with non-vector functions

Question

I think I found the answer to my question. There is an "fmacs" instruction for VFP which may do the trick which does scalar computation on NEON/VFP registers.

I'm very new to NEON or ARM programming...

I want to load up an upper triangular matrix into NEON registers and integrate (accumulate) the outer product of a vector using single precision. The basic ideas is A += x'*x where A is an upper triangular matrix. Some of the operations can be done by vectorizing the operations by using the NEON instruction "vmla.f32" on quad or double registers. However, sometimes I only need to operate on a single precision register 1 at a time, i.e. not on 2 or 4 single precision registers. In the example below (does not work), I'm interested in the line

// A[8-14] += A[1]*x[1-7] 
"mla  s16, s16, d0[1]
	"

I want to use the NEON registers to perform one single precision operation.

Code snippet:

    __asm__ volatile (
    //load x into registers
    "vldmia    %0, {d0-d3}
	"
    // load A into registers
    "vldmia    %1, {d4-d12}
	"
    "vldmia    %1, {d13-d21}
	"
    // A[0-7] += x[0]*x[0-7]
    "vmla.f32  q2, q2, d0[0]
	"
    "vmla.f32  q3, q3, d0[0]
	"
    // A[8-14] += A[1]*x[1-7]
    "mla  s16, s16, d0[1]
	"
    // output
    :
    // input
    : "r"(A), "r"(x)
    // registers
    : "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10"
    );

Guy Sirton · Accepted Answer

So I think you're asking about multiplying a vector with a scalar?

I would use "vdup" to load the scalar into all lanes of a NEON register and then multiply.

If you can post a plain C version of what you're trying to do I could try and help more...

Mixing NEON assembly with non-vector functions

Tags:

assembly

arm

neon

paul

1 Answers

Guy Sirton

Recent Activity

Donate For Us

Mixing NEON assembly with non-vector functions

Tags:

assembly

arm

neon

paul

1 Answers

Guy Sirton

Related questions

Recent Activity

Donate For Us