I'm trying to find out why the following very simple, minimal example takes ~33% of CPU usage on my i7-5500U CPU, Windows 10 computer:
import time
import numpy as np
import scipy.linalg
import cProfile
class CPUTest:
def __init__(self):
self.running = True
def compute_stuff(self):
dims = 150
A = np.random.random((dims, dims))
B = scipy.linalg.inv(np.dot(A.T, A))
def run(self):
prev_time = time.time()
start_time = prev_time
while self.running:
time.sleep(0.3)
st = time.time()
self.compute_stuff()
et = time.time()
print 'Time for the whole iteration, inc. sleep: %.3f (ms), whereas the processing segment took %.3f (ms): ' % ((st - prev_time) * 1000, (et - st) * 1000)
prev_time = st
if st - start_time > 10.0:
break
t = CPUTest()
t.run()
# cProfile.run('t.run()')
The compute_stuff function takes only 2ms, the rest of the time the program is sleeping. As the sleep should not use CPU, this program should, in theory, run with only a 0.6% CPU usage, but it is currently taking around 30%.
I've tried a profiler, which confirms that the program is in sleep 9.79 seconds out of the 10.
Can someone please provide a hint on why is python behaving this way? What would be an alternative to reduce the CPU usage.
Thanks a lot!
Edit
In summary, the program is asleep >97% of the time, and I still get a CPU usage of 33%. I would like to reduce the CPU usage without sacrificing the frequency for computations.
Here you can find an example of the program output:
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 1.000 (ms):
Time for the whole iteration, inc. sleep: 301.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 3.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 1.000 (ms):
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 302.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 1.000 (ms):
Time for the whole iteration, inc. sleep: 301.000 (ms), whereas the processing segment took 2.000 (ms):
Time for the whole iteration, inc. sleep: 303.000 (ms), whereas the processing segment took 1.000 (ms):
and this is the output from the profiler:
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 10.050 10.050 <string>:1(<module>)
1 0.019 0.019 0.021 0.021 __init__.py:133(<module>)
1 0.067 0.067 0.119 0.119 __init__.py:205(<module>)
1 0.000 0.000 0.000 0.000 _components.py:1(<module>)
1 0.000 0.000 0.000 0.000 _laplacian.py:3(<module>)
49 0.000 0.000 0.000 0.000 _methods.py:37(_any)
49 0.000 0.000 0.001 0.000 _methods.py:40(_all)
49 0.011 0.000 0.137 0.003 _util.py:141(_asarray_validated)
1 0.001 0.001 0.001 0.001 _validation.py:1(<module>)
1 0.000 0.000 0.000 0.000 _version.py:114(_compare)
1 0.000 0.000 0.000 0.000 _version.py:148(__gt__)
2 0.000 0.000 0.000 0.000 _version.py:55(__init__)
1 0.000 0.000 0.000 0.000 _version.py:78(_compare_version)
1 0.008 0.008 0.009 0.009 base.py:1(<module>)
1 0.000 0.000 0.000 0.000 base.py:15(SparseWarning)
1 0.000 0.000 0.000 0.000 base.py:19(SparseFormatWarning)
1 0.000 0.000 0.000 0.000 base.py:23(SparseEfficiencyWarning)
1 0.000 0.000 0.000 0.000 base.py:61(spmatrix)
49 0.000 0.000 0.000 0.000 base.py:887(isspmatrix)
49 0.043 0.001 0.185 0.004 basic.py:619(inv)
49 0.000 0.000 0.001 0.000 blas.py:177(find_best_blas_type)
49 0.001 0.000 0.002 0.000 blas.py:223(_get_funcs)
1 0.000 0.000 0.000 0.000 bsr.py:1(<module>)
1 0.000 0.000 0.000 0.000 bsr.py:22(bsr_matrix)
1 0.012 0.012 0.012 0.012 compressed.py:1(<module>)
1 0.000 0.000 0.000 0.000 compressed.py:21(_cs_matrix)
1 0.000 0.000 0.000 0.000 construct.py:2(<module>)
1 0.000 0.000 0.000 0.000 coo.py:1(<module>)
1 0.000 0.000 0.000 0.000 coo.py:21(coo_matrix)
49 0.000 0.000 0.000 0.000 core.py:5960(isMaskedArray)
49 0.001 0.000 0.242 0.005 cpuTests.py:10(compute_stuff)
1 0.013 0.013 10.050 10.050 cpuTests.py:15(run)
1 0.000 0.000 0.000 0.000 csc.py:1(<module>)
1 0.000 0.000 0.000 0.000 csc.py:19(csc_matrix)
1 0.008 0.008 0.020 0.020 csr.py:1(<module>)
1 0.000 0.000 0.000 0.000 csr.py:21(csr_matrix)
18 0.000 0.000 0.000 0.000 data.py:106(_create_method)
1 0.000 0.000 0.000 0.000 data.py:121(_minmax_mixin)
1 0.000 0.000 0.000 0.000 data.py:22(_data_matrix)
1 0.000 0.000 0.000 0.000 data.py:7(<module>)
1 0.000 0.000 0.000 0.000 dia.py:1(<module>)
1 0.000 0.000 0.000 0.000 dia.py:17(dia_matrix)
1 0.000 0.000 0.000 0.000 dok.py:1(<module>)
1 0.000 0.000 0.000 0.000 dok.py:29(dok_matrix)
1 0.000 0.000 0.000 0.000 extract.py:2(<module>)
49 0.000 0.000 0.001 0.000 fromnumeric.py:1887(any)
49 0.005 0.000 0.006 0.000 function_base.py:605(asarray_chkfinite)
49 0.000 0.000 0.000 0.000 getlimits.py:245(__init__)
49 0.000 0.000 0.000 0.000 getlimits.py:270(max)
49 0.000 0.000 0.002 0.000 lapack.py:405(get_lapack_funcs)
49 0.002 0.000 0.003 0.000 lapack.py:447(_compute_lwork)
1 0.000 0.000 0.000 0.000 lil.py:19(lil_matrix)
1 0.002 0.002 0.002 0.002 lil.py:2(<module>)
49 0.000 0.000 0.000 0.000 misc.py:169(_datacopied)
3 0.000 0.000 0.000 0.000 nosetester.py:181(__init__)
3 0.000 0.000 0.000 0.000 ntpath.py:174(split)
3 0.000 0.000 0.000 0.000 ntpath.py:213(dirname)
3 0.000 0.000 0.000 0.000 ntpath.py:96(splitdrive)
49 0.000 0.000 0.000 0.000 numeric.py:406(asarray)
49 0.000 0.000 0.000 0.000 numeric.py:476(asanyarray)
98 0.000 0.000 0.000 0.000 numerictypes.py:942(_can_coerce_all)
49 0.000 0.000 0.000 0.000 numerictypes.py:964(find_common_type)
5 0.000 0.000 0.000 0.000 re.py:138(match)
2 0.000 0.000 0.000 0.000 re.py:143(search)
7 0.000 0.000 0.000 0.000 re.py:230(_compile)
1 0.000 0.000 0.000 0.000 sputils.py:2(<module>)
1 0.000 0.000 0.000 0.000 sputils.py:227(IndexMixin)
3 0.000 0.000 0.000 0.000 sre_compile.py:228(_compile_charset)
3 0.000 0.000 0.000 0.000 sre_compile.py:256(_optimize_charset)
3 0.000 0.000 0.000 0.000 sre_compile.py:433(_compile_info)
6 0.000 0.000 0.000 0.000 sre_compile.py:546(isstring)
3 0.000 0.000 0.000 0.000 sre_compile.py:552(_code)
3 0.000 0.000 0.000 0.000 sre_compile.py:567(compile)
3 0.000 0.000 0.000 0.000 sre_compile.py:64(_compile)
7 0.000 0.000 0.000 0.000 sre_parse.py:149(append)
3 0.000 0.000 0.000 0.000 sre_parse.py:151(getwidth)
3 0.000 0.000 0.000 0.000 sre_parse.py:189(__init__)
16 0.000 0.000 0.000 0.000 sre_parse.py:193(__next)
3 0.000 0.000 0.000 0.000 sre_parse.py:206(match)
13 0.000 0.000 0.000 0.000 sre_parse.py:212(get)
3 0.000 0.000 0.000 0.000 sre_parse.py:268(_escape)
3 0.000 0.000 0.000 0.000 sre_parse.py:317(_parse_sub)
3 0.000 0.000 0.000 0.000 sre_parse.py:395(_parse)
3 0.000 0.000 0.000 0.000 sre_parse.py:67(__init__)
3 0.000 0.000 0.000 0.000 sre_parse.py:706(parse)
3 0.000 0.000 0.000 0.000 sre_parse.py:92(__init__)
1 0.000 0.000 0.000 0.000 utils.py:117(deprecate)
1 0.000 0.000 0.000 0.000 utils.py:51(_set_function_name)
1 0.000 0.000 0.000 0.000 utils.py:68(__init__)
1 0.000 0.000 0.000 0.000 utils.py:73(__call__)
3 0.000 0.000 0.000 0.000 {_sre.compile}
1 0.000 0.000 0.000 0.000 {dir}
343 0.000 0.000 0.000 0.000 {getattr}
3 0.000 0.000 0.000 0.000 {hasattr}
158 0.000 0.000 0.000 0.000 {isinstance}
270 0.000 0.000 0.000 0.000 {len}
49 0.000 0.000 0.001 0.000 {method 'all' of 'numpy.ndarray' objects}
49 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}
211 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
49 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5 0.000 0.000 0.000 0.000 {method 'end' of '_sre.SRE_Match' objects}
6 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
3 0.000 0.000 0.000 0.000 {method 'find' of 'bytearray' objects}
205 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'group' of '_sre.SRE_Match' objects}
49 0.000 0.000 0.000 0.000 {method 'index' of 'list' objects}
3 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
5 0.000 0.000 0.000 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
49 0.021 0.000 0.021 0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
98 0.001 0.000 0.001 0.000 {method 'reduce' of 'numpy.ufunc' objects}
3 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects}
2 0.000 0.000 0.000 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
60 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
6 0.000 0.000 0.000 0.000 {min}
147 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
49 0.036 0.001 0.036 0.001 {numpy.core.multiarray.dot}
4 0.000 0.000 0.000 0.000 {ord}
18 0.000 0.000 0.000 0.000 {setattr}
3 0.000 0.000 0.000 0.000 {sys._getframe}
49 9.794 0.200 9.794 0.200 {time.sleep}
99 0.000 0.000 0.000 0.000 {time.time}
Second Edit
I have implemented the equivalent C++ version (below). The C++ version does have the behaviour I was expecting: it is using only 0.3% to 0.5% of CPU usage!
#include <iostream>
#include <chrono>
#include <random>
#include <thread>
// Tune this values to get a computation lasting from 2 to 10ms
#define DIMS 50
#define MULTS 20
/*
This function will compute MULTS times matrix multiplications of transposed(A)*A
We simply want to waste enough time doing computations (tuned to waste between 2ms and 10ms)
*/
double compute_stuff(double A[][DIMS], double B[][DIMS]) {
double res = 0.0;
for (int k = 0; k < MULTS; k++) {
for (int i = 0; i < DIMS; i++) {
for (int j = 0; j < DIMS; j++) {
B[i][j] = 0.0;
for (int l = 0; l < DIMS; l++) {
B[i][j] += A[l][j] * A[j][l];
}
}
}
// We store the result from the matrix B
for (int i = 0; i < DIMS; i++) {
for (int j = 0; j < DIMS; j++) {
A[i][j] = B[i][j];
}
}
}
for (int i = 0; i < DIMS; i++) {
for (int j = 0; j < DIMS; j++) {
res += A[i][j];
}
}
return res;
}
int main() {
std::cout << "Running main" << std::endl;
double A[DIMS][DIMS]; // Data buffer for a random matrix
double B[DIMS][DIMS]; // Data buffer for intermediate computations
std::default_random_engine generator;
std::normal_distribution<double> distribution(0.0, 1.0);
for (int i = 0; i < DIMS; i++) {
for (int j = 0; j < DIMS; j++) {
A[i][j] = distribution(generator);
}
}
bool keep_running = true;
auto prev_time = std::chrono::high_resolution_clock::now();
auto start_time = prev_time;
while (keep_running)
{
std::this_thread::sleep_for(std::chrono::milliseconds(300));
auto st = std::chrono::high_resolution_clock::now();
double res = compute_stuff(A, B);
auto et = std::chrono::high_resolution_clock::now();
auto iteration_time = std::chrono::duration_cast<std::chrono::milliseconds>(st - prev_time).count();
auto computation_time = std::chrono::duration_cast<std::chrono::milliseconds>(et - st).count();
auto elapsed_time = std::chrono::duration_cast<std::chrono::milliseconds>(et - start_time).count();
std::cout << "Time for the whole iteration, inc. sleep:" << iteration_time << " (ms), whereas the processing segment took " << computation_time << "(ms)" << std::endl;
keep_running = elapsed_time < 10 * 1000;
prev_time = st;
}
}
Here you can also see the output from the C++ equivalent program:
Time for the whole iteration, inc. sleep:314 (ms), whereas the processing segment took 7(ms)
Time for the whole iteration, inc. sleep:317 (ms), whereas the processing segment took 7(ms)
Time for the whole iteration, inc. sleep:316 (ms), whereas the processing segment took 8(ms)
Time for the whole iteration, inc. sleep:316 (ms), whereas the processing segment took 7(ms)
Time for the whole iteration, inc. sleep:314 (ms), whereas the processing segment took 10(ms)
There seems to be something python specific going on. The same behaviour has been confirmed in 3 machines (linux and Windows)
I figure out this problem, when i wrote a program for a game.
And i realized even if i create a while infinite loop that just print a hello world msg still cpu usage of my program is 30%.
So i use time.sleep(0.05) in the begining of my while loop and end of that.
And my problem solved. Just play with sleep in your loop. I think can be done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With