The code below reproduces the problem I have encountered in the algorithm I'm currently implementing:
import numpy.random as rand
import time
x = rand.normal(size=(300,50000))
y = rand.normal(size=(300,50000))
for i in range(1000):
t0 = time.time()
y *= x
print "%.4f" % (time.time()-t0)
y /= y.max() #to prevent overflows
The problem is that after some number of iterations, things start to get gradually slower until one iteration takes multiple times more time than initially.
A plot of the slowdown
CPU usage by the Python process is stable around 17-18% the whole time.
I'm using:
As @Alok pointed out, this seems to be caused by denormal numbers affecting the performance. I ran it on my OSX system and confirmed the issue. I don't know of a way to flush denormals to zero in numpy. I would try to get around this issue in the algorithm by avoiding the very small numbers: do you really need to be dividing y
until it gets down to 1.e-324
level?
If you avoid the low numbers e.g. by adding the following line in your loop:
y += 1e-100
then you'll have a constant time per iteration (albeit slower because of the extra operation). Another workaround is to use higher precision arithmetics, e.g.
x = rand.normal(size=(300,50000)).astype('longdouble')
y = rand.normal(size=(300,50000)).astype('longdouble')
This will make each of your steps more expensive, but each step take roughly the same time.
See the following comparison in my system:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With