Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

How to check inf for AVX intrinsic __m256

c++ c sse intrinsics avx

float point multiplication: LOSING speed with AVX against SSE?

c++ performance sse avx

__m256d TRANSPOSE4 Equivalent?

c++ matrix sse transpose avx

Convert __m128d to double

c++ sse

Intel intrinsics : multiply interleaved 8bit values

c intel sse simd intrinsics

Enabling arch:SSE2 makes program slower

c++ sse

Are there Neon equivalents to Sse2 _mm_unpackhi/lo_epi32/64 and _mm_shuffle_epi8/32?

c++ arm sse simd neon

Convert __m128i value into std::tuple

c++ c++11 sse simd

AVX 3.6x slower than IA32 in simple benchmark involving <cmath> operations - why so? (VS2013)

c++ visual-studio sse simd avx

What is the fastest/best way to combine registers with arbitrary lane selections in AVX/SSE?

intel sse intrinsics avx

Do the higher level SSE flags imply the lower ones in GCC / clang?

gcc sse

Implict SSE/AVX loads/stores and the stack

sse avx

Must all 16 bytes of an x86 MASKMOVDQU instruction be valid memory?

assembly x86 alignment sse

SSE/AVX floating point convert exceptions

Writing a piece of C code such that compiler uses SSE4.1 instruction for generating assembly Code

c optimization gcc sse simd

Intel x86_64 assembly compare signed double precision floats

Testing which trits are set in a binary representation

Euclidean distance using intrinsic instruction

Convert 16 bits mask to 16 bytes mask