how to improve numpy performance in this short code?

Question

I am trying to get down to why one of my python scripts is slow by a factor of about 4 compared to gfortran and I have got to this:

import numpy as np

nvar_x=40
nvar_y=10

def fn_tst(x):
    for i in range(int(1e7)):
        y=np.repeat(x,1+nvar_y)
    return y

x = np.arange(40)
y = fn_tst(x)

print y.min(),y.max()

This is about 13 times slower than the following fortran code

module test
integer,parameter::nvar_x=40,nvar_y=10
contains
subroutine fn_tst(x,y)
real,dimension(nvar_x)::x
real,dimension(nvar_x*(1+nvar_y))::y

do i = 1,10000000
   do k = 1,nvar_x
      y(k)=x(k)
      ibeg=nvar_x+(k-1)*nvar_y+1
      iend=ibeg+nvar_y-1
      y(ibeg:iend)=x(k)
   enddo
enddo

end subroutine fn_tst
end module test

program tst_cp
use test
real,dimension(nvar_x)::x
real,dimension(nvar_x*(1+nvar_y))::y
do k = 1,nvar_x
   x(k)=k-1
enddo

call fn_tst(x,y)

print *,minval(y),maxval(y)

stop
end

Can you please suggest ways to speed the python script. Also other pointers to good performance with numpy would be appreciated. I'd rather stick with python than build python wrappers for fortran routines.

Thanks

@isedev, So, is this it. 1.2s gfortran vs. 6.3s for Python? This is the first time I've worried about performance but as I said, I could only get to about a fourth of gfortran speed with Python in the code I was trying to speed up.

And right, sorry the codes were not doing the same thing. Indeed, what you indicate in the loop is more like what I have in the original code.

Unless I'm missing something, I do not agree with the last statement: I have to create y in fn_tst. and np.repeat is just one of the terms on the RHS (place o/p directly in existing array). If I comment out the np.repeat term things are fast...

rhs_slow = rhs[:J]
rhs_fast = rhs[J:]

rhs_fast[:] = c* ( b*in2[3:-1] * ( in2[1:-3] - in2[4:]  ) - fast) + hc_ovr_b * np.repeat(slow,K) #slow

isedev · Accepted Answer

For a start, the python code doesn't generate the same output as the fortran code. In the fortran program, y is the sequence 0 to 39, followed by ten 0's, ten 1's, ..., all the way to ten 39's. The python code outputs eleven 0's, eleven 1's all the way to eleven 39's.

This code produces the same output and performs a similar number of memory allcations as your original code:

import numpy as np

nvar_x = 40
nvar_y = 10

def fn_tst(x):
    for i in range(10000000):
        y = np.empty(nvar_x*(1+nvar_y))
        y[0:nvar_x] = x[0:nvar_x]
        y[nvar_x:] = np.repeat(x,nvar_y)
    return y

x = np.arange(40)
fn_tst(x)

print y.min(), y.max()

On my system (with 1,000,000 loops only), fortran code runs in 1.2s and the above python in 8.6s.

However, this is not a fair comparison: with the fortran code, y is allocated once (outside the fn_tst routine) and with the python code, y is allocated within the fn_tst function.

So, rewriting the Python code as follows provides a better comparison:

import numpy as np

nvar_x = 40
nvar_y = 10

def fn_tst(x,y):
    for i in range(10000000):
        y[0:nvar_x] = x[0:nvar_x]
        y[nvar_x:] = np.repeat(x,nvar_y)
    return y

x = np.arange(40)
y = np.empty(nvar_x*(1+nvar_y))
fn_tst(x,y)

print y.min(), y.max()

On my system, the above runs in 6.3s (again, 1,000,000 iterations). So already approx. 25% faster.

The main performance hit in this case though is that numpy.repeat() is generating an array which then needs to be copied back into y. Things would be much faster if numpy.repeat() could be instructed to place its output directly in an existing array (i.e. y in this case)... but that doesn't appear to be possible.

how to improve numpy performance in this short code?

Tags:

performance

python

numpy

fortran

Balu

1 Answers

isedev

Recent Activity

Donate For Us

how to improve numpy performance in this short code?

Tags:

performance

python

numpy

fortran

Balu

1 Answers

isedev

Related questions

Recent Activity

Donate For Us