Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write inline assembly codes about LOOP in Xcode LLVM?

I'm studying about inline assembly. I want to write a simple routine in iPhone under Xcode 4 LLVM 3.0 Compiler. I succeed write basic inline assembly codes.

example :

int sub(int a, int b)
{
    int c;
    asm ("sub %0, %1, %2" : "=r" (c) : "r" (a), "r" (b));
    return c;
}

I found it in stackoverflow.com and it works very well. But, I don't know how to write code about LOOP.

I need to assembly codes like

void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity)
{
    for(int i=0; i<numPixels; i++)
    {
        dst[i] = src[i] + intensity;
    }
}
like image 268
이영보 Avatar asked Dec 23 '11 08:12

이영보


1 Answers

Take a look here at the loop section - http://en.wikipedia.org/wiki/ARM_architecture

Basically you'll want something like:

void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
    asm volatile (
                  "\t mov r3, #0\n"
                  "Lloop:\n"
                  "\t cmp r3, %2\n"
                  "\t bge Lend\n"
                  "\t ldrb r4, [%0, r3]\n"
                  "\t add r4, r4, %3\n"
                  "\t strb r4, [%1, r3]\n"
                  "\t add r3, r3, #1\n"
                  "\t b Lloop\n"
                  "Lend:\n"
                 : "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
                 : "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
                 : "cc", "r3", "r4");
}

Update:

And here's that NEON version:

void brighten_neon(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
    asm volatile (
                  "\t mov r4, #0\n"
                  "\t vdup.8 d1, %3\n"
                  "Lloop2:\n"
                  "\t cmp r4, %2\n"
                  "\t bge Lend2\n"
                  "\t vld1.8 d0, [%0]!\n"
                  "\t vqadd.s8 d0, d0, d1\n"
                  "\t vst1.8 d0, [%1]!\n"
                  "\t add r4, r4, #8\n"
                  "\t b Lloop2\n"
                  "Lend2:\n"
                  : "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
                  : "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
                  : "cc", "r4", "d1", "d0");
}

So this NEON version will do 8 at a time. It does however not check that numPixels is divisible by 8 so you'd definitely want to do that otherwise things will go wrong! Anyway, it's just a start at showing you what can be done. Notice the same number of instructions, but action on eight pixels of data at once. Oh and it's got the saturation in there as well that I assume you would want.

like image 73
mattjgalloway Avatar answered Nov 14 '22 00:11

mattjgalloway