Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

AVX 3.6x slower than IA32 in simple benchmark involving <cmath> operations - why so? (VS2013)

c++ visual-studio sse simd avx

What is the fastest/best way to combine registers with arbitrary lane selections in AVX/SSE?

intel sse intrinsics avx

Do the higher level SSE flags imply the lower ones in GCC / clang?

gcc sse

Implict SSE/AVX loads/stores and the stack

sse avx

Must all 16 bytes of an x86 MASKMOVDQU instruction be valid memory?

assembly x86 alignment sse

SSE/AVX floating point convert exceptions

Writing a piece of C code such that compiler uses SSE4.1 instruction for generating assembly Code

c optimization gcc sse simd

Intel x86_64 assembly compare signed double precision floats

Testing which trits are set in a binary representation

Euclidean distance using intrinsic instruction

Convert 16 bits mask to 16 bytes mask

Broadcast one arbitrary element of __m128 vector

c++ x86 sse simd sse2

Most efficient way to convert vector of float to vector of uint32?

assembly floating-point sse

SSE2 8x8 byte-matrix transpose code twice as slow on Haswell+ then on ivy bridge

Loop is not vectorized when variable extent is used

Sign of the maximum absolute value in an __m128, SSE4

c++ sse simd

cost of if check vs sse operation?

c sse

Moving 2 QWORDs from general purpose registers into an XMM register as high/low [duplicate]

assembly x86-64 masm sse

Fast way to set single bit in SSE datatypes (__m128i)?

c++ bit-manipulation intel sse