Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in neon

neon float multiplication is slower than expected

c++ gcc arm simd neon

Fastest Inverse Square Root on iPhone

ARM GCC bug? Uses chains of vldr instead of one vldmia…

gcc assembly arm neon

Sum all elements in a quadword vector in ARM assembly with NEON

math assembly arm neon

Loop takes more cycles to execute than expected in an ARM Cortex-A72 CPU

Efficient floating point comparison (Cortex-A8)

c++ c neon cortex-a8 arm7

LSB to MSB bit reversal on ARM

arm bit-manipulation neon

ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

c++ c arm vectorization neon

How can I optimize a looped 4D matrix-vector-multiplication with ARM NEON?

android c android-ndk arm neon

Compacting data in buffer from 16 bit per element to 12 bits

c arm simd neon

How to convert _mm_shuffle_ps SSE intrinsic to NEON intrinsic?

arm sse simd neon

On iOS how to quickly convert RGB24 to BGR24?

Summing 3 lanes in a NEON float32x4_t

ios arm simd neon intrinsics

Is there an advantage of specifying "-mfpu=neon-vfpv3" over "-mfpu=neon" for ARMs with separate pipelines?

gcc assembly arm neon armv7

Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?

neon

128-bit rotation using ARM Neon intrinsics

c rotation intrinsics neon

ARM and NEON can work in parallel?

SSE _mm_movemask_epi8 equivalent method for ARM NEON

arm sse neon

128bit hash comparison with SSE

Which one is better, gcc or armcc for NEON optimizations?

embedded arm simd neon cortex-a8