Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sse2 vectorization and virtual machines

I am considering vectorizing some floor() calls using sse2 intrinsics, then measuring the performance gain. But ultimately the binary is going to be run on a virtual machine which I have no access to.

I don't really know how a VM works. Is a binary entirely executed on a software-emulated virtual cpu ?

If not, supposing the VM is run on a cpu with SSE2, could the VM use his cpu SSE2 instruction when executing a SSE2 instruction from my binary ?

Could my vectorization be beneficial on the VM ?

like image 529
ThreeStarProgrammer57 Avatar asked Jan 18 '17 22:01

ThreeStarProgrammer57


2 Answers

I don't really know how a VM works. Is a binary entirely executed on a software-emulated virtual cpu?

For serious purposes, no, because it's too slow. (But e.g. Bochs does; it can be useful for kernel debugging among other things)

The binary is executed "normally" as much as possible. This generally means any code that doesn't try to interact with the OS will be executed directly. For example, system calls are likely to require the involvement of the VM implementation.

If not, supposing the VM is run on a cpu with SSE2, could the VM use his cpu SSE2 instruction when executing a SSE2 instruction from my binary?

Yes.

Could my vectorization be beneficial on the VM?

Yes.

like image 50
user253751 Avatar answered Nov 16 '22 13:11

user253751


Depends on VM technology and CPU capabilities. First x86 VMs (like VMWare on 32-bit machines) used recompilation. They looked into binary code of VMs to seek for harmful instructions (like accessing raw memory or special registers) to replace them with hyper-calls.

Since SSE2 instructions are not harmful, they would just left as is, and no performance penalty added in VM. Moreover, modern x86 CPUs use "hardware virtualization" which allows to avoid recompilation. Harmful instructions are caught by CPU and generate an interrupt, but again SSE2 instrs shouldn't trigger it.

There are of course full processor emulators like QEMU (not QEMU-KVM) or Bochs, but it's a different story. Bochs-emulated CPU, for example, is about 1000 times slower than host CPU.

like image 3
myaut Avatar answered Nov 16 '22 12:11

myaut