I developed a python program that does heavy numerical calculations. I run it on a linux machine with 32 Xeon CPUs, 64GB RAM, and Ubuntu 14.04 64-bit. I launch multiple python instances with different model parameters in parallel to use multiple processes without having to worry about the global interpreter lock (GIL). When I monitor the cpu utilization using htop
, I see that all cores are used, however most of the time by kernel. Generally, the kernel time is more than twice the user time. I'm afraid that there is a lot of overhead going on on the system level, but I'm not able to find the cause for this.
How would one reduce the high kernel CPU usage?
Here are some observation I made:
sys.setcheckinterval(10000)
(for python2) and sys.setswitchinterval(10)
(for python3) but none of this helpedschedtool -B PID
but this didn't helpEdit:
Here is a screenshot of htop
:
I also ran perf record -a -g
and this is the report by perf report -g graph
:
Samples: 1M of event 'cycles', Event count (approx.): 1114297095227
- 95.25% python3 [kernel.kallsyms] [k] _raw_spin_lock_irqsave ◆
- _raw_spin_lock_irqsave ▒
- 95.01% extract_buf ▒
extract_entropy_user ▒
urandom_read ▒
vfs_read ▒
sys_read ▒
system_call_fastpath ▒
__GI___libc_read ▒
- 2.06% python3 [kernel.kallsyms] [k] sha_transform ▒
- sha_transform ▒
- 2.06% extract_buf ▒
extract_entropy_user ▒
urandom_read ▒
vfs_read ▒
sys_read ▒
system_call_fastpath ▒
__GI___libc_read ▒
- 0.74% python3 [kernel.kallsyms] [k] _mix_pool_bytes ▒
- _mix_pool_bytes ▒
- 0.74% __mix_pool_bytes ▒
extract_buf ▒
extract_entropy_user ▒
urandom_read ▒
vfs_read ▒
sys_read ▒
system_call_fastpath ▒
__GI___libc_read ▒
0.44% python3 [kernel.kallsyms] [k] extract_buf ▒
0.15% python3 python3.4 [.] 0x000000000004b055 ▒
0.10% python3 [kernel.kallsyms] [k] memset ▒
0.09% python3 [kernel.kallsyms] [k] copy_user_generic_string ▒
0.07% python3 multiarray.cpython-34m-x86_64-linux-gnu.so [.] 0x00000000000b4134 ▒
0.06% python3 [kernel.kallsyms] [k] _raw_spin_unlock_irqresto▒
0.06% python3 python3.4 [.] PyEval_EvalFrameEx
It seems as if most of the time is spent calling _raw_spin_lock_irqsave
. I have no idea what this means, though.
Limiting CPU and Memory Usagegetrlimit() method. It returns the current soft and hard limit of the resource. Each resource is controlled by a pair of limits: a soft limit and a hard limit. The soft limit is the current limit, and may be lowered or raised by a process over time.
In Python, single-CPU use is caused by the global interpreter lock (GIL), which allows only one thread to carry the Python interpreter at any given time. The GIL was implemented to handle a memory management issue, but as a result, Python is limited to using a single processor.
If the problem exists in kernel, you should narrow down a problem using a profiler such as OProfile or perf.
I.e. run perf record -a -g
and than read profiling data saved into perf data
using perf report
. See also: linux perf: how to interpret and find hotspots.
In your case high CPU usage is caused by competition for /dev/urandom
-- it allows only one thread to read from it, but multiple Python processes are doing so.
Python module random
is using it only for initialization. I.e:
$ strace python -c 'import random;
while True:
random.random()'
open("/dev/urandom", O_RDONLY) = 4
read(4, "\16\36\366\36}"..., 2500) = 2500
close(4) <--- /dev/urandom is closed
You may also explicitly ask for /dev/urandom
by using os.urandom
or SystemRandom
class. So check your code which is dealing with random numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With