Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in simd

Print value of __m128 datatype in gdb debugger

c++ gdb sse simd intrinsics

How to concatenate two vector efficiently using AVX2? (a lane-crossing version of VPALIGNR)

c simd intrinsics avx avx2

How to convert 'long long' (or __int64) to __m64

Optimal SSE unsigned 8 bit compare

c x86 sse simd sse4

How to compile SIMD code with gcc

c++ gcc g++ simd

Fast transposition of an image and Sobel Filter optimization in C (SIMD)

c optimization sse simd

Performance of unaligned SIMD load/store on aarch64

alignment simd neon arm64

"Safe" SIMD arithmetic on aligned vectors of odd size?

AVX 256-bit equivalent for _mm_load1_ps

simd intrinsics avx

Loading non contiguous values with Intel SIMD SSE

assembly x86 intel sse simd

AVX-512 and Branching

Which assemblers currently support the AVX instruction set?

x86 assembly simd avx intel

Shifting SSE/AVX registers 32 bits left and right while shifting in zeros

x86 sse simd avx avx2

Efficient way of rotating a byte inside an AVX register

c sse simd avx avx2

Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32

How to optimize C-code with SSE-intrinsics for packed 32x32 => 64-bit multiplies, and unpacking the halves of those results for (Galois Fields)

c optimization x86 sse simd

SSE multiplication of 2 64-bit integers

x86 sse simd multiplication sse2

Does Haskell perfom SIMD optimizations automatically?

haskell simd

Profiling SIMD Code

c++ c sse simd

Optimal SIMD algorithm to rotate or transpose an array