Currently I am testing an application on a server which has 64 cores. This server has virtualbox installed which can use up to 32 cores but not more (this limit is given by virtualbox). Due to the fact that I am using mininet to test my application I need root privileges to execute it. I do not have root rights on the server but in the VM. So my setup is:
Host has 64 cores and ubuntu installed
virtualbox VM with ubuntu has 1 - 32 cores
My application is running on 16 mininet hosts, every host is running a program that uses multicast and unicast to communicate with each other, but not too many requests for now. About 5 requests per host after they started. The start with a delay of 3 seconds to avoid bottlenecks at the start
My application uses multiple threads but each application instance on the host is independend of the others
My application uses the APScheduler of python and is completely written in python
I thought running it with 32 cores would be the best. But when I do that, everything starts to hang. I get timeouts in APScheduler and the system load is extremely high.
So I tried it with every number of cores between 1 and 32. Here are some examples:
1 core
4 cores
8 cores
12 cores
16 cores
20 cores
23 cores
27 cores
32 cores
The x axis is in half seconds, the y acis is the CPU load reported by top -b -n 1 in percent. I ran the app with each core count for about 10 minutes. the blue line is the mean CPU load of my application. the red line is my application, the green line is the overall system load.
As you can see, the load gets lower up to about 16 cores. when using more than 16 cores it gets slower and starting at about 23 cores it gets extremely slow. Even that slow that the process that logs the CPU load is not even called anymore. This is why the graphs in the last diagrams are shorter...
Does anybody have an idea what could be the problem? Is this a known bug of virtualbox? Is this a mininet problem? or a linux problem? How can I know which parts cause the extreme load?
If you need more info, please write a comment and I will edit the question.
The load on the guest system was never higher than 50%, so I think that is not the problem.
Is it possible that VMWare would be faster?
EDIT I reviewed the plots and found that the blue line which describes the mean CPU load of my application (average over all instances on all mininet hosts) is even getting higher when changing from 1 to 2 to 3 to ... 16 cores. But from 1 to 16 cores the cpu load of my app increases only very very slow. While this increases the overall system load goes down (which makes sense in my opinion as ubuntu can do its tasks on different cores which is faster as long there are no shared ressources).
So why is the mean increasing? And why is it increasing exponentially beginning at 16 cores?
This is common behavior once a program starts running across a processor socket boundary. In general you will start to see unpredictable timing behavior once your application starts executing on cores that reside on different physical processors.
Assuming that your 64 core machine has four processor sockets with 16 cores each, and also assuming that your scheduler is a sane scheduler that tries to keep an application's threads grouped on the same socket, then your application should see good parallel speedup between 1 and 16 cores, but it will start to run poorly once it uses more than 16 cores, since some of those must reside on a separate socket.
This is true for regular machines as well as virtualized machines, but a virtual machine is liable to add yet another layer of unpredictability if it's scheduler is not aware of these socket boundaries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With