my image processing project works with grayscale images. I have ARM Cortex-A8 processor platform. I want to make use of the NEON.
I have a grayscale image( consider the example below) and in my alogorithm, I have to add only the columns.
How can I load four 8-bit pixel values in parallel, which are uint8_t, as four uint32_t into one of the 128-bit NEON registers? What intrinsic do I have to use to do this?
I mean:
I must load them as 32 bits because if you look carefully, the moment I do 255 + 255 is 512, which can't be held in a 8-bit register.
e.g.
255 255 255 255 ......... (640 pixels)
255 255 255 255
255 255 255 255
255 255 255 255
.
.
.
.
.
(480 pixels)
I will recommend that you spend a bit of time understanding how SIMD works on ARM. Look at:
Take a look at:
to get you started. You can then implement your SIMD code using inline assembler or corresponding ARM intrinsics recommended by domen.
Depends on your compiler and (possible lack of) extensions.
Ie. for GCC, this might be a starting point: http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With