Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

efficient way to convert scatter indices into gather indices?

Permuting bytes inside SSE __m128i register

optimization sse simd

How to merge a scalar into a vector without the compiler wasting an instruction zeroing upper elements? Design limitation in Intel's intrinsics?

c gcc x86 sse intrinsics

Can PTEST be used to test if two registers are both zero or some other condition?

assembly x86 sse intrinsics sse4

libc's system() when the stack pointer is not 16-padded causes segmentation fault

Neon equivalent to SSE intrinsics

c arm sse multiplication neon

Faster assembly optimized way to convert between RGB8 and RGB32 image

Is there still any development on SIMD in Mono?

c# mono sse simd

Matrix-vector-multiplication in AVX not proportionately faster than in SSE

Print value of __m128 datatype in gdb debugger

c++ gdb sse simd intrinsics

How to convert 'long long' (or __int64) to __m64

Bypass delays when switching execution unit domains

assembly intel sse

Optimal SSE unsigned 8 bit compare

c x86 sse simd sse4

Questions regarding operations on NaN

SSE intrinsics - comparison if/else optimization

c++ sse intrinsics

Fastest way to compare one byte array with many others?

c algorithm assembly x86-64 sse

Fast transposition of an image and Sobel Filter optimization in C (SIMD)

c optimization sse simd

SSE: unaligned load and store that crosses page boundary

"Safe" SIMD arithmetic on aligned vectors of odd size?

Loading non contiguous values with Intel SIMD SSE

assembly x86 intel sse simd