I have a python code, which imports 4 column txt file with numbers first three columns are x,y,z coordinate and fourth column is a density at that coordinate.
below is the code that reads, converts to ndarray, Fourier transform that field, calculate the distance from origin (k=(0,0,0)) and a transformed coordinate, and taking average, and plot them. Thanks to pandas (python library for data analysis) and python FFT, loading 256^3 rows and Fourier transform them are very fast and done in few seconds.
However, transforming the loaded txt to numpy ndarray, calculating the average density (average values of each coordinate), and calculating distance from the origin (k=(0,0,0)) take very long time.
I think the problem is np.around part at the end but I can't figure out the way to optimize it.
I have a resource of 32 core machine.
Could someone teach me how to speed up, make it a multiprocess code, or something like that so that those can be done very quickly? Thanks.
(If you are cosmologist and ever need this code, you may use it but please contact me, if you can. Thanks)
from __future__ import division
import numpy as np
ngridx = 128
ngridy = 128
ngridz = 128
maxK = max(ngridx,ngridy,ngridz)
#making input file
f = np.zeros((ngridx*ngridy*ngridz,4))
i = 0
for i in np.arange(len(f)):
f[i][0] = int(i/(ngridy*ngridz))
f[i][1] = int((i/ngridz))%ngridy
f[i][2] = int(i%ngridz)
f[i][3] = np.random.rand(1)
if i%1000000 ==0:
print i
#This takes forever
#end making input file
#Thanks to Mike,
a = f[:,3].reshape(ngridx,ngridy,ngridz)
avg =np.sum(f[:,3])/len(f)
a /= avg
p = np.fft.fftn(a)
#This part is much much faster than before (Original Post).
#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p
#This is just a convension on fourier transformation so you can ignore this part
kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )
kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )
kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )
kx = np.zeros((ngridx,ngridy,ngridz))
ky = np.zeros((ngridx,ngridy,ngridz))
kz = np.zeros((ngridx,ngridy,ngridz))
rangecolx = np.arange(ngridx)
rangecoly = np.arange(ngridy)
rangecolz = np.arange(ngridz)
for row in np.arange(ngridx):
for column in np.arange(ngridy):
for height in np.arange(ngridz):
kx[row][column][height] = (kValx[row])
ky[row][column][height] = (kValy[column])
kz[row][column][height] = (kValz[height])
if row%10 == 0:
print row
print 'wavenumber generate complete!'
#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)
#by taking the spherical shell of thickness 1 and averaging out the values inside it.
#I am sure that this process can be optimised somehow, but I gave up.
qlen = maxK/2 #Nyquist frequency
q = np.zeros(((qlen),4),dtype=complex)
#q is a four column array with length maxK/2.
#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))
#q[:,1] is the sum of square of the fourier transformed value
#q[:,2] is the sum of the fourier transformed value,
#and q[:,3] is the total number of samples with K=q[:,0]
for i in np.arange(len(q)):
q[i][0] = i
i = 0
for i in np.arange(len(p)):
for r in np.arange(len(p[0])):
for s in np.arange(len(p[0,0])):
K = np.around(np.sqrt(kx[i,r,s]**2+ky[i,r,s]**2+kz[i,r,s]**2))
if K < qlen:
q[K][1]=q[K][1]+np.abs(p[i,r,s])**2
q[K][2]=q[K][2]+p[i,r,s]
q[K][3]=q[K][3]+1
if i%10 ==0:
print 'i = ',i,' !'
print q
Numpy can typically do things hundreds of time faster than plain python, with very little extra effort (and sometimes will even automatically use multiple cores with no work from you). You just have to know the right ways to write your code. Just to name the first things I think of:
One mistake that I think we all make is trying to write only in code, even when we're still trying to figure out what the code should do — basically, speaking a language we don't really understand very well, which isn't really descriptive and has a very limited number of words. I find it better to first outline my approach in plain English as a series of steps. This lets me see clearly, and in my own words, exactly what I want the code to do. So it's easy to get an overview and think about what's going on, but also easy to rewrite if I realize that something needs to be reworked.
In fact, I write the steps as comments on separate lines in a code file. Once I'm happy with the approach, I start writing the code itself between those comments. This leaves me with a well structured and properly commented code.
In the particular example of the original code in the OP, the entire f
array probably doesn't need to exist. If you were just writing out the procedure in plain English, I think you would see this, and be able to change your approach easily in plain English before you code it up. As it stands, so much of the code relies on f
that it would require a complete rewrite to change it.
Plain python is often very slow at things that a computer should be great at. One example is with indexing, so a line like
a[f[i,0]][f[i,1]][f[i,2]]=f[i,3]
makes me very suspicious. Is this the one you're referring to when you say "transforming the loaded txt to numpy ndarray" takes a very long time? That wouldn't surprise me, because each time python sees a[f[i,0]]
, it has to index f
first, making sure that i
is an integer, and you haven't run off the edge of f
; then it has to make sure f[i,0]
is an integer, and you haven't run off the edge of a
. Then it has to repeat this two more times before it even knows which element you want to set.
One improvement is to use a[f[i,0],f[i,1],f[i,2]]
, because numpy is faster with that kind of indexing.
But I assume your data are actually in some kind of order. For example, does f[i,2]
loop from 0 to 256, then f[i,1]
increments by 1, and f[i,2] starts over at 0? If so, all you actually need to do is to say something like
a = f[:,3].reshape(ngridx,ngridy,ngridz)
That's a ridiculously fast operation, taking a fraction of a millisecond. The shape might be wrong, so you might have to change the order of the arguments, are do something with transposes, but the basic idea is definitely there. You can read about it in the documentation.
More specifically, if I understand the code correctly, those first three sections of code could be completely replaced by just
a = np.random.rand(ngridx,ngridy,ngridz)
That is, f
doesn't need to exist at all, as far as I can see.
You don't need to copy everything, and when you do need to copy an array (or part of an array), you should let numpy do it for you. For example, instead of your Firstdel
function, just use a[1:]
:
def Firstdel(a):
return a[1:]
This doesn't copy the data at all; the thing it returns just ignores the fact that another number is stored in RAM immediately before it. Or, if you really need to make a copy of the data (which you don't just for plotting) use this:
def Firstdel(a):
return numpy.copy(a[1:])
But typically, you can just use "slices" of numpy arrays, rather than copying them. Read about it here.
However, also be aware that if you just take a slice (and no copying happens), that means that if you write to the slice, the original array will also be written to.
Loops are also notorious time wasters — specifically when those loops are in the python code, rather than some C code. In fact, I would argue that the whole point of numpy is to replace loops that you would write in python with loops that they've written for you in C.
First of all, while
is not common in python for simple loops. So instead of
while i < len(f):
# do stuff
i = i+1
you should probably use
for i in range(len(f)):
# do stuff
But this is just an issue of style and readability. For speed, you need to get rid of as many loops as you can — especially nested loops. To set kx
, ky
, and kz
, this code is about 10 times faster than the nested loops for the parameters in the OP, but scales as N instead of N^3 (where N=ngridx*ngridy*ngridz
):
for row in range(ngridx):
kx[row,:,:] = kValx[row]
for column in range(ngridy):
ky[:,column,:] = kValy[column]
for height in range(ngridz):
kz[:,:,height] = kValz[height]
Slicing can also be useful for setting values, because the loop goes inside of numpy. Instead of this code
i = 0
while i < len(q):
q[i][0] = i
i = i + 1
just use
q[:,0] = np.arange(len(q))
Here, I'm just setting a "slice" of q
equal to another array.
The nested loops following that loop can also be sped up, but they'll be a bit more complicated.
But you also want to avoid loops wherever possible. Which brings us to...
Again, the main reason numpy exists is to translate these slow python loops into fast C
code (that we don't need to know exists). So there are a lot of functions that do things you want to do already built in to numpy.
Instead of
while i < len(f):
masstot = masstot + f[i,3]
i = i+1
you should use something like
masstot = np.sum(f[:,3])
It's simpler to read, but also likely to be way faster, because numpy has direct access to that data in the computer's memory, and can use fast C
functions to find the sum, rather than using slow python functions. (Again, you don't need to know anything about the C
functions; they'll just do the work.)
One common mistake is to use numpy functions, but still do the loops themselves. There's a prime example of that in this code:
for i in np.arange(len(p)):
for r in np.arange(len(p[0])):
for s in np.arange(len(p[0,0])):
K = np.around(np.sqrt(kx[i,r,s]**2+ky[i,r,s]**2+kz[i,r,s]**2))
Instead of that big nested loop calculating values of K
each time through the loop, just make K
an array with the appropriate values:
K = np.around(np.sqrt(kx**2+ky**2+kz**2))
K
will be the same size and shape as kx
, ky
, and kz
. Then, you can use advanced indexing to set the values of q
. This is how you could do that last section:
# Again, we get rid of nested loops, to get a large improvement in speed and scaling
K = np.around(np.sqrt(kx**2+ky**2+kz**2)).astype(int)
for i in range(qlen):
p_i = p[K==i] # This will be just the elements of p where K==i
q[i,0] = i
q[i,1] = np.sum(np.abs(p_i)**2)
q[i,2] = np.sum(p_i)
q[i,3] = len(p_i)
print q
Putting these tips together, I get about a factor of 70 improvement over the code currently in your question. If it's still too slow, you can use a profiler to find the slow parts. And if you still can't improve them, it might be worth trying numba, which compiles the code as you need it, to run almost as fast as equivalent C code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With