Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Using sse and avx intrinsics to add a set of packed singles into one value

c++ c++11 sse avx

"Missing" arithmetic instructions in Tilera and SSE. How are the operations done?

byte-wise operations on a xmm register (AMD64)

Slow SIMD performance - no inlining

rust simd sse avx2

SSE _mm_load_ps causing segmentation faults

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

Nibble shuffling with x64 SIMD

x86-64 simd sse

Websocket data unmasking / multi byte xor

c x86 sse simd avx

Does VS2010 SP1 support only part of the AVX instruction set?

How to efficiently add two vectors in C++

c++ x86 sse simd sse2

Different semantic of comparison intrinsic instructions in avx512?

c++ sse intrinsics avx avx512

Integer dot product using SSE/AVX?

c++ vectorization sse simd avx

Can I enable vectorization only for one part of the code?

c++ gcc sse pragma

Intel vector instruction to zero-extend 8 4-bit values packed in a 32-bit int to a __m256i?

sse avx avx2

SSE much slower than regular function

how abundant is hardware support for FMA instruction set

x86 hardware sse simd avx

"Extend" data type size in SSE register

c sse simd