Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Multiply Accumulate Instruction Inline Assembly in C++

I am implementing a FIR filter on an ARM9 processor and am trying to use the SMLAL instruction.

Initially I had the following filter implemented and it worked perfectly, except this method uses too much processing power to be used in our application.

uint32_t DDPDataAcq::filterSample_8k(uint32_t sample)
 {
    // This routine is based on the fir_double_z routine outline by Grant R Griffin
    // - www.dspguru.com/sw/opendsp/alglib.htm 
    int i = 0; 
    int64_t accum = 0; 
    const int32_t *p_h = hCoeff_8K; 
    const int32_t *p_z = zOut_8K + filterState_8K;


    /* Cast the sample to a signed 32 bit int 
     * We need to preserve the signdness of the number, so if the 24 bit
     * sample is negative we need to move the sign bit up to the MSB and pad the number
     * with 1's to preserve 2's compliment. 
     */
    int32_t s = sample; 
    if (s & 0x800000)
        s |= ~0xffffff;

    // store input sample at the beginning of the delay line as well as ntaps more
    zOut_8K[filterState_8K] = zOut_8K[filterState_8K+NTAPS_8K] = s;

    for (i =0; i<NTAPS_8K; ++i)
    {
        accum += (int64_t)(*p_h++) * (int64_t)(*p_z++);
    }

    //convert the 64 bit accumulator back down to 32 bits
    int32_t a = (int32_t)(accum >> 9);


    // decrement state, wrapping if below zero
    if ( --filterState_8K < 0 )
        filterState_8K += NTAPS_8K;

    return a; 
} 

I have been attempting to replace the multiply accumulate with inline assembly since GCC is not using a MAC instruction even with optimization turned on. I replaced the for loop with the following:

uint32_t accum_low = 0; 
int32_t accum_high = 0; 

for (i =0; i<NTAPS_4K; ++i)
{
    __asm__ __volatile__("smlal %0,%1,%2,%3;"
        :"+r"(accum_low),"+r"(accum_high)
        :"r"(*p_h++),"r"(*p_z++)); 
} 

accum = (int64_t)accum_high << 32 | (accum_low); 

The output I now get using the SMLAL instruction is not the filtered data I was expecting. I have been getting random values that seem to have no pattern or connection to the original signal or the data I am expecting.

I have a feeling I am doing something wrong with splitting the 64 bit accumulator into the high and low registers for the instruction, or I am putting them back together wrong. Either way I not sure why I am not able to get the correct output by swapping the C code with the inline assembly.

like image 608
John C Avatar asked Aug 23 '10 17:08

John C


People also ask

What is inline assembly with example?

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

Does C support inline assembly?

Inline assembly (typically introduced by the asm keyword) gives the ability to embed assembly language source code within a C program. Unlike in C++, inline assembly is treated as an extension in C.


1 Answers

Which compiler version are you using? I tried compiling your C only code using GCC 4.4.3 using the options -O3 -march=armv5te and it generated the smlal instructions.

like image 193
Nils Pipenbrinck Avatar answered Oct 01 '22 19:10

Nils Pipenbrinck