We have a build system which, until a few weeks ago, was taking about 1.5 hours for a full build of each target device.
At some point, that's been bumped up to about 3.5 hours which, because we build for about nine different targets, has increased our build time from fourteen hours to about thirty-two.
We think we've finally established where the issue lies. A VM running on my Win10 box (the guest is Ubuntu 16.04) was copied over to a Win7 box. The VM was totally unchanged in terms of its setup, what type of disks it ran on, and so on. The machine was also very similarly specced out (same CPU, disks, etc).
As an aside, I was originally running VirtualBox 5.x while the Win7 box had 6.0.12 but I don't think that's the issue since an upgrade to 6.0.14 on my box made no change. Even moving the VM disk across to an SSD on my box gave little relief, meaning it's almost certainly CPU bound.
The Win7 box running the VM did each build in about 1.5 hours.
Then the only change we made to that box was an in-place upgrade to Win10 and, lo and behold, the builds are now also taking 3.5 hours each.
A little research shows that there are a few people having issues with VirtualBox/Win10 as both host and guest but the advice given (video memory increase, re-balancing of CPUs/memory between host and guest, enabling/disabling video acceleration, etc) doesn't seem to fix anything.
We are mulling over a few ideas such as:
Does anyone have any other ideas on how to go forward?
We've been doing some investigation and it turns out that the culprit is the Spectre2/Meltdown mitigation introduced in Windows 10.
We had found from a few web sites that the impact varied but was most hurtful to build server farms and developer boxes (see here, for example):
When turning off the mitigation with the Gibson Research InSpectre tool (after airgapping the machine for safety, of course), the builds once again came down to an hour and a half per target.
Now we just need to figure out how to go forward on this. We may well have to build on airgapped machines that already have the source ready to go.
Some further details. All of our developer machines are CPUID 306c3 Haswell
which is one that was hit particularly hard by the mitigations. We are going to test it on a more modern processor, CPUID 810f10
(an AMD Ryzen 5) to see if the impact is less.
If so, we may opt to purchase a couple of those. In either case, this answer will be updated with the results.
Hopefully this will be the final update. Although we originally succeeded in regaining speed by disabling the Spectre/Meltdown mitigations in the Windows host, this was not really a viable solution given the possibility of being hacked.
Further investigation seemed to show that, while VirtualBox suffered in this environment, VMware did not. So we went looking for something to explain the difference.
Eventually, we encountered this thread which described a similar problem and, on trying one of the proposed solutions there, we found that we could regain the speed without compromising the host OS.
The solution is to, while your VM is shut down (not suspended), run the following command:
vboxmanage modifyvm VM_NAME --spec-ctrl on
where VM_NAME
should be replaced by your actual VM name (obtained with vboxmanage list vms
). Then, after restarting the VM, it should once again run at normal speed.
Unfortunately, this means my business case for getting new Threadripper PCs for the whole development team has now collapsed. Damn you, Internet :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With