Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Poor performance of VirtualBox under Win10

We have a build system which, until a few weeks ago, was taking about 1.5 hours for a full build of each target device.

At some point, that's been bumped up to about 3.5 hours which, because we build for about nine different targets, has increased our build time from fourteen hours to about thirty-two.

We think we've finally established where the issue lies. A VM running on my Win10 box (the guest is Ubuntu 16.04) was copied over to a Win7 box. The VM was totally unchanged in terms of its setup, what type of disks it ran on, and so on. The machine was also very similarly specced out (same CPU, disks, etc).

As an aside, I was originally running VirtualBox 5.x while the Win7 box had 6.0.12 but I don't think that's the issue since an upgrade to 6.0.14 on my box made no change. Even moving the VM disk across to an SSD on my box gave little relief, meaning it's almost certainly CPU bound.

The Win7 box running the VM did each build in about 1.5 hours.

Then the only change we made to that box was an in-place upgrade to Win10 and, lo and behold, the builds are now also taking 3.5 hours each.

A little research shows that there are a few people having issues with VirtualBox/Win10 as both host and guest but the advice given (video memory increase, re-balancing of CPUs/memory between host and guest, enabling/disabling video acceleration, etc) doesn't seem to fix anything.

We are mulling over a few ideas such as:

  • running Ubuntu on bare metal but that obviously makes it harder to move VMs around;
  • running these Ubuntu guests on top of Linux hosts so, assuming the issue is Win10, we get a performance improvement while still allowing VM mobility;
  • sticking with Win10 but using VMWare Player rather than VirtualBox (currently testing to see if this is viable);
  • reverting to Win7 for the build boxes, but I don't think IT are going to be happy with that proposal (i.e., not a snowflake's hope in hell of that being approved :-) ).

Does anyone have any other ideas on how to go forward?

like image 229
paxdiablo Avatar asked Jan 26 '23 15:01

paxdiablo


1 Answers

We've been doing some investigation and it turns out that the culprit is the Spectre2/Meltdown mitigation introduced in Windows 10.

We had found from a few web sites that the impact varied but was most hurtful to build server farms and developer boxes (see here, for example):

enter image description here

When turning off the mitigation with the Gibson Research InSpectre tool (after airgapping the machine for safety, of course), the builds once again came down to an hour and a half per target.

Now we just need to figure out how to go forward on this. We may well have to build on airgapped machines that already have the source ready to go.


Some further details. All of our developer machines are CPUID 306c3 Haswell which is one that was hit particularly hard by the mitigations. We are going to test it on a more modern processor, CPUID 810f10 (an AMD Ryzen 5) to see if the impact is less.

If so, we may opt to purchase a couple of those. In either case, this answer will be updated with the results.


Hopefully this will be the final update. Although we originally succeeded in regaining speed by disabling the Spectre/Meltdown mitigations in the Windows host, this was not really a viable solution given the possibility of being hacked.

Further investigation seemed to show that, while VirtualBox suffered in this environment, VMware did not. So we went looking for something to explain the difference.

Eventually, we encountered this thread which described a similar problem and, on trying one of the proposed solutions there, we found that we could regain the speed without compromising the host OS.

The solution is to, while your VM is shut down (not suspended), run the following command:

vboxmanage modifyvm VM_NAME --spec-ctrl on

where VM_NAME should be replaced by your actual VM name (obtained with vboxmanage list vms). Then, after restarting the VM, it should once again run at normal speed.

Unfortunately, this means my business case for getting new Threadripper PCs for the whole development team has now collapsed. Damn you, Internet :-)

like image 99
paxdiablo Avatar answered Mar 06 '23 20:03

paxdiablo