Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RyuJIT not making full use of SIMD intrinsics

Tags:

c#

avx

simd

sse

ryujit

I'm running some C# code that uses System.Numerics.Vector<T> but as far as I can tell I'm not getting the full benefit of SIMD intrinsics. I'm using Visual Studio Community 2015 with Update 1, and my clrjit.dll is v4.6.1063.1.

I'm running on an Intel Core i5-3337U Processor, which implements the AVX instruction set extensions. Therefore, I figure, I should be able to execute most SIMD instructions on a 256 bit register. For example, the disassembly should contain instructions like vmovups, vmovupd, vaddups, etc..., and Vector<float>.Count should return 8, Vector<double>.Count should be 4, etc... But that's not what I'm seeing.

Instead my disassembly contains instructions like movups, movupd, addups, etc... and the following code:

WriteLine($"{Vector<byte>.Count} bytes per operation");
WriteLine($"{Vector<float>.Count} floats per operation");
WriteLine($"{Vector<int>.Count} ints per operation");
WriteLine($"{Vector<double>.Count} doubles per operation");

Produces:

16 bytes per operation
4 floats per operation
4 ints per operation
2 doubles per operation

Where am I going wrong? To see all project settings etc. the project is available here.

like image 538
eoinmullan Avatar asked Jan 20 '16 10:01

eoinmullan


People also ask

What is intrinsics SIMD?

It's like assembly language, but written inside your C/C++ program. SIMD intrinsics actually look like a function call, but generally produce a single instruction (a vector operation instruction, also known as a SIMD instruction).

Does c# use SIMD?

Using SIMD in C# codeThe simplest and recommended way to use SIMD is via the classes and static methods in the System. Numerics namespace. Note, you need at least version 4.1. 0.0 of the System.


1 Answers

Your processor is a bit dated, its micro-architecture is Ivy Bridge. The "tock" of Sandy Bridge, a feature shrink without architectural changes. Your nemesis is this bit of code in RyuJIT, located in ee_il_dll.cpp, CILJit::getMaxIntrinsicSIMDVectorLength() function:

if (((cpuCompileFlags & CORJIT_FLG_PREJIT) == 0) &&
    ((cpuCompileFlags & CORJIT_FLG_FEATURE_SIMD) != 0) &&
    ((cpuCompileFlags & CORJIT_FLG_USE_AVX2) != 0))
{
    static ConfigDWORD fEnableAVX;
    if (fEnableAVX.val(CLRConfig::EXTERNAL_EnableAVX) != 0)
    {
        return 32;
    }
}

Note the use of CORJIT_FLG_USE_AVX2. Your processor does not support AVX2 yet, that extension became available in Haswell. The next micro-architecture after Ivy Bridge, a "tick". Very nice processor btw, discoveries like this one have a major wow factor.

Nothing you can do about this but go shopping. For inspiration, you can look at the kind of code it generates in this post.

like image 194
Hans Passant Avatar answered Oct 24 '22 07:10

Hans Passant