I'm looking into using these to improve the performance of some code but good documentation seems hard to find for the functions defined in the *mmintrin.h headers, can anybody provide me with pointers to good info on these?
EDIT: particularly interested in a very basic tutorial on how to get started.
Use SIMD intrinsics. It's like assembly language, but written inside your C/C++ program. SIMD intrinsics actually look like a function call, but generally produce a single instruction (a vector operation instruction, also known as a SIMD instruction).
What are vector intrinsics? To a programmer, intrinsics look just like regular library functions; you include the relevant header, and you can use the intrinsic. To add four float numbers to another four numbers, use the _mm_add_ps intrinsic in your code.
intel-intrinsics is the SIMD library for D. intel-intrinsics lets you use SIMD in D with support for LDC / DMD / GDC with a single syntax and API: the x86 Intel Intrinsics API that is also used within the C, C++, and Rust communities.
SIMD is short for Single Instruction/Multiple Data, while the term SIMD operations refers to a computing method that enables processing of multiple data with a single instruction. In contrast, the conventional sequential approach using one instruction to process each individual data is called scalar operations.
There's a handy Intel Intrinsics Guide for Mac/Linux/Windows at http://software.intel.com/en-us/articles/intel-intrinsics-guide - it covers all Intel SIMD stuff from MMX through the various flavours of SSE up to AVX2 et al.
You can also get the following PDFs from Intel:
Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A: Instruction Set Reference, A-M (253666-021)
Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B: Instruction Set Reference, N-Z (253667-021)
Intel® SSE4 Programming Reference (D91561-001)
There is now an online version of the intrinsics guide, so you no longer need to install anything, and it's always up-to-date.
This is the best introduction to MMX/SSE programming I ever found. (I've programmed SSE2 for 5 years and I still find this tutorial to be the most conceptually clear.)
http://www.tommesani.com/Docs.html
This is not a complete list of instructions; so once you're ready to learn more, do start reading the Intel intrinsics guide as @PaulR suggests.
One important thing to keep in mind is that MMX/SSE tend to be severely limiting in terms of movements of data (shuffle or arbitrary permutation, or change of single element). This is a limitation of CPU silicon design. Scatter-gather instructions were only added a few years ago, and might not even be available on your customer's computers.
There is a large repertoire of vectorization tricks for MMX/SSE similar to the way http://www.hackersdelight.org/ prescribes tricks for exploiting bit-parallel operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With