Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to test AVX-512 instructions w/o supported hardware? [closed]

I'm trying to learn x86-64's new AVX-512 instructions, but neither of my computers have support for them. I tried using various disassemblers (from Visual Studio to online ones: 1, 2) to see the instructions for specific opcode encodings, but I'm getting somewhat conflicting results. Plus, it would've been nice to run some instructions and see their actual output.

So I'm wondering if there is an online service that allows to compile small (x86-64) assembly code and run it, or step through it, on a specific processor? (Say, Intel's Sandy Bridge, Cannon Lake, etc.)

like image 380
MikeF Avatar asked Dec 13 '22 15:12

MikeF


1 Answers

Use Intel® Software Development Emulator, aka SDE to run an executable on an emulated CPU that supports future instruction-sets. It's freeware (not open source, but a free download), and is available for Linux, Windows, and I think also OS X.

https://software.intel.com/en-us/articles/debugging-applications-with-intel-sde has step-by-step instructions for how to debug with it on Windows or Linux: SDE can work as a GDB remote, so you can run sde -debug -- ./your-program, then in another terminal run gdb ./your-program and use target remote :portnumber to connect to the SDE process so you can set breakpoints and single-step.


You might be able to do the same thing with QEMU, if they've added support for emulating AVX512. QEMU can also act as a GDB remote.

QEMU definitely has configurable instruction-set stuff, e.g. you could tell it to emulate an x86 with AVX but not AVX2 (like Sandybridge.) SDM can probably do the same thing.

You could even tell it to emulate something you won't find on real hardware, like AVX2 but not BMI1/2, if you want to verify that your CPUID checks don't assume anything implies anything else that isn't guaranteed.


Remember that these are both essentially useless for performance testing, only for correctness of your vectorization. IACA could be useful to get an idea of performance on SKX, but it's far from perfect and doesn't model memory bottlenecks at all. (Only the actual pipeline in some level of detail.)

like image 155
Peter Cordes Avatar answered Jan 05 '23 00:01

Peter Cordes