Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Proper way to enable SSE4 on a per-function / per-block of code basis?

xcode clang llvm sse

SSE: convert short integer to float

x86 sse simd

How to get GCC to use more than two SIMD registers when using intrinsics?

gcc assembly x86 sse simd

byte array permute SSE optimization

c++ gcc x86-64 sse simd

NEON vs Intel SSE - equivalence of certain operations

c++ c sse simd neon

indexing into an array with SSE

c sse simd

What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?

c++ assembly sse avx avx2

Using std::atomic with aligned classes

c++ c++11 sse

Why does gcc/clang use two 128bit xmm registers to pass a single value?

c++ c assembly clang sse

When program will benefit from prefetch & non-temporal load/store?

c sse prefetch temporal

Am I breaking strict aliasing rules?

c++ c++11 sse strict-aliasing

8 bit shift operation in AVX2 with shifting in zeros

c sse simd avx avx2

G++ SSE memory alignment on the stack

Does the Linux kernel have its own SSE/AVX context?

Optimizing variable-length encoding

c++ c assembly sse

Does compiler use SSE instructions for a regular C code?

Fastest way to expand bits in a field to all (overlapping + adjacent) set bits in a mask?

c assembly x86 sse avx

Is an __m128i variable zero?

c++ c intel sse simd

SIMD signed with unsigned multiplication for 64-bit * 64-bit to 128-bit

Strict aliasing, -ffast-math and SSE