Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MMX intrinsics like _mm_cvtpd_pi32 not found with MSVC 2019 for 64bit targets; change from 2013?

I'm currently working on updating a large codebase from VS2013 to VS2019. One of the compiler errors I've run into is as follows:

intrinsics.h(348): error C3861: '_mm_cvtpd_pi32': identifier not found

This intrinsic function is defined in Visual Studio's "emmintrin.h". I only get this error when targeting 64-bit builds. On closer inspection is see that, between 2013 and 2019 the emmintrin.h definition changed from this:

extern __m64 _mm_cvtpd_pi32(__m128d _A);
extern __m64 _mm_cvttpd_pi32(__m128d _A);
extern __m128d _mm_cvtpi32_pd(__m64 _A);

To this:

#if defined(_M_IX86)
extern __m64 _mm_cvtpd_pi32(__m128d _A);
extern __m64 _mm_cvttpd_pi32(__m128d _A);
extern __m128d _mm_cvtpi32_pd(__m64 _A);
#endif

ie: The preprocessor directive ensures that the functions are now only available for 32bit targets. The 3rd party header file from which the error originates makes use of these functions regardless of the target (64bit or 32bit). Presumably the best course of action here is to edit this header file to ensure that this function is only called upon for 32-bit targets. However what I'm more curious about is why was this changed from 2013 to 2019? I see a description of this function here:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtpd_pi32&expand=1705

Was it never applicable to 64bit targets to begin with? Or has it been replaced with a 64bit version that I need to consider?

like image 927
Nimo Avatar asked Mar 30 '20 15:03

Nimo


1 Answers

I don't know if there's a way to get MSVC 2019 to compile this legacy MMX intrinsic.

It is safe to use MMX instructions in 64-bit code on Windows, but MS doesn't make it easy to build such code using MS compilers. The intrinsic might not be supported by newer MSVC; use a better compiler (like clang) if you need to compile old code with MMX intrinsics if there's no workaround for MSVC.

(Early in the history of x86-64 and 64-bit Windows, the fact that MS removed some compiler or assembler support for MMX got some people worried that maybe the Windows kernel wouldn't properly do context-switching for the x87/MMX state. That doubt was unfounded. If you can get MMX code to compile/assemble, e.g. with other tools, it will still run perfectly fine. Windows supports it, and x86-64 CPUs in long mode do still have full support for MMX. I don't use Windows and I don't remember exactly what kind of MMX support was removed.)


Of course normally it's better to use SSE2 instead of MMX, i.e. the epi32 instrinsics instead of pi32 (or whatever other integer element width). SSE2 is baseline for x86-64, and also required for double-precision SIMD (including that conversion intrinsic).

The use-case for that conversion is (I think) mainly to get MMX integer vectors for use with existing legacy MMX-vectorized code.

But in this specific case cvtpd2pi is actually not slower than cvtpd2qd (the normal SSE2 _mm_cvtpd_epi32) - both are 2 uops, I think because even within the XMM register domain it has to shuffle the 32-bit integers to the bottom. https://www.uops.info/table.html. Unlike the ps version where FP->int conversion between XMM registers is single-uop.

MMX instructions have worse throughput than the equivalent SSE2/3 instructions on recent CPUs (running on fewer ports), and mov-elimination doesn't work on them.

like image 185
Peter Cordes Avatar answered Nov 05 '22 11:11

Peter Cordes