Consider I have a program to do AES operations.
Some advanced CPUs have AES-NI instruction set, and other CPUs don't have.
Must I compile my program into two executables: A_with_aes_ni.exe and B_without_aes_ni.exe ?
An Instruction Set Architecture (ISA) is part of the abstract model of a computer that defines how the CPU is controlled by the software. The ISA acts as an interface between the hardware and the software, specifying both what the processor is capable of doing as well as how it gets done.
Instructions are stored in memory and the contents of the PC register are used as the starting address from where the next to be executed instruction is read. Because the length of an 68k instruction is bytes can vary decoding and reading the instruction from memory (steps 1 and 2) is an iterative process.
What you want is called a CPU dispatcher. Agner Fog has 10 pages of text on this in chapter three "Making critical code in multiple versions for different instruction sets" of his Optimizing C++ manual . He discusses doing this both with GCC and ICC.
You only need one executable but you need to compile two different object files with and without AES enabled. Then the dispatcher determines what instruction set is available and chooses the code path based on that.
I tried to do this with MSVC2010 cpu dispatcher for visual studio for AVX and SSE but did not succeed. I suspect I could get it working now though.
Edit:
In Agner Fog's vectorclass he has a file dispatch_example.cpp
and instrset_detech.cpp
which should have most of what you need to make a dispatcher. You still need to figure out how to detect if a CPU has AES. You need to augment the intrset_detect.cpp file. According to wikipedia when you read CPUID bit 23 in register ECX is set if the CPU has AES. Wikipedia also has code examples to read CPUID (besides instrset_detech.cpp
- another good example is at https://github.com/Mysticial/Flops in the file cpuid.c)
One way we do this in Solaris is to have hardware capabilities libraries, which are dynamically loaded at runtime by the linker.
Another option is to firstly load a trap handler for illegal instructions, then test for your desired machine language instructions. If you hit the trap, then you know that you can't use the optimised version and have to load the non-optimised (or lesser-optimised).
While I like Andrew's suggestion above, I think it's safer to test for the specific instructions that you need. That way you don't have to keep updating your app for newer CPUID output.
Edited to add:
I realise I should have provided an example. For Solaris' libc on the x64
platform, we provide hw-optimised versions of the library - three are for
32bit, one for 64bit. We can see the differences by running elfdump -H
on the file of interest:
s11u1:jmcp $ elfdump -H /usr/lib/libc/libc_hwcap1.so.1
Capabilities Section: .SUNW_cap
Object Capabilities:
index tag value
[0] CA_SUNW_HW_1 0x86d [ SSE MMX CMOV SEP CX8 FPU ]
Symbol Capabilities:
index tag value
[2] CA_SUNW_ID hrt
[3] CA_SUNW_HW_1 0x40002 [ TSCP TSC ]
Symbols:
index value size type bind oth ver shndx name
[1] 0x000f306c 0x00000225 FUNC LOCL D 0 .text gettimeofday%hrt
[2] 0x000f2efc 0x00000165 FUNC LOCL D 0 .text gethrtime%hrt
Capabilities Chain Section: .SUNW_capchain
Capabilities family: gettimeofday
chainndx symndx name
1 [702] gettimeofday
2 [1] gettimeofday%hrt
Capabilities family: gethrtime
chainndx symndx name
4 [1939] gethrtime
5 [2] gethrtime%hrt
s11u1:jmcp $ elfdump -H /usr/lib/libc/libc_hwcap2.so.1
Capabilities Section: .SUNW_cap
Object Capabilities:
index tag value
[0] CA_SUNW_HW_1 0x1875 [ SSE2 SSE MMX CMOV AMD_SYSC CX8 FPU ]
Symbol Capabilities:
index tag value
[2] CA_SUNW_ID hrt
[3] CA_SUNW_HW_1 0x40002 [ TSCP TSC ]
Symbols:
index value size type bind oth ver shndx name
[1] 0x000f253c 0x00000225 FUNC LOCL D 0 .text gettimeofday%hrt
[2] 0x000f23cc 0x00000165 FUNC LOCL D 0 .text gethrtime%hrt
Capabilities Chain Section: .SUNW_capchain
Capabilities family: gettimeofday
chainndx symndx name
1 [702] gettimeofday
2 [1] gettimeofday%hrt
Capabilities family: gethrtime
chainndx symndx name
4 [1939] gethrtime
5 [2] gethrtime%hrt
Guess which of the above is for AMD systems, and which for Intel?
The Solaris linker has smarts to load the correct hwcap library at runtime before your process' _init() is called.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With