Creating new processes is very slow on some of my machines, and not others.
The machines are all similar, and some of the slow machines are running the exact same workloads on the same hardware and kernel (2.6.32-26, Ubuntu 10.04) as some of the fast machines. Tasks that do not involve process creation are the same speeds on all machines.
For example, this program executes ~50 times slower on the affected machines:
int main()
{
int i;
for (i=0;i<10000;i++)
{
int p = fork();
if (!p) exit(0);
waitpid(p);
}
return 0;
}
What could be causing task creation to be much slower, and what other differences could I look for in the machines?
Edit1: Running bash scripts (as they spawn a lot of subprocesses) is also very slow on these machines, and strace on the slow scripts shows the slowdown in the clone()
kernel call.
Edit2: vmstat
doesn't show any significant differences on the fast vs slow machines. They all have more than enough RAM for their workloads and don't go to swap.
Edit3: I don't see anything suspicious in dmesg
Edit4: I'm not sure why this is on stackoverflow now, I'm not asking about the example program above (just using it to demonstrate the problem), but linux administration/tuning, but if people think it belongs here, cool.
We experienced the same issue with our application stack, noticing massive degradation in application performance and longer clone times with strace. Using your test program across 18 nodes, I reproduced your results on the same 3 we were experiencing slow clone times with. All nodes were provisioned the same way, but with slightly different hardware. We checked the BIOS, vmstat, vm.overcommit_memory and replaced the RAM with no improvement. We then moved our drives to updated hardware and the issue was resolved.
CentOS 5.9 2.6.18-348.1.1.el5 #1 SMP Tue Jan 22 16:19:19 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
"bad" and "good" lspci:
$ diff ../bad_lspci_sort ../good_lspci_sort
< Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 05)
> Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
< Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM Controller (rev 09)
> Host bridge: Intel Corporation Xeon E3-1200 v2/Ivy Bridge DRAM Controller (rev 09)
< ISA bridge: Intel Corporation C204 Chipset Family LPC Controller (rev 05)
> ISA bridge: Intel Corporation C202 Chipset Family LPC Controller (rev 05)
< PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b5)
> PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
< VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 04)
> VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)
I might start by using strace to see what system calls are being run, and where the slow ones hang. I'm also curious as to how you're using waitpid() here. On my systems, the signature for waitpid is
pid_t waitpid(pid_t pid, int *status, int options);
It sort of looks like you're using wait(), but passing in the pid of the child process instead of an int "status" that has an OR of the status flags you want to test for. That could cause some strange things to happen, I would expect, if the PID ended up being interpreted as a status mask.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With