I want to interpolate one axis of data inside a 3-dimensional array. The given x-values for the different vales differ slightly but they should all be mapped to the same x-values.
Since the given x-values are not identical, currently I do the following:
import numpy as np
from scipy import interpolate
axes_have = np.ones( ( 2, 72, 2001 ) )
axes_have *= np.linspace( 0, 100, 2001 )[None,None,:]
axes_have += np.linspace( -0.3, 0.3, 144 ).reshape( ( 2, 72 ) )[:,:,None]
arr = np.sin( axes_have )
arr *= np.random.random( (2, 72 ) )[:,:,None]
axis_want = np.linspace( 0, 100, 201 )
arr_ip = np.zeros( ( 2, 72, 201 ) )
for i in range( arr.shape[0] ):
for j in range( arr.shape[1] ):
ip_func = interpolate.PchipInterpolator( axes_have[i,j,:], arr[i,j,:], extrapolate=True )
arr_ip[i,j,:] = ip_func( axis_want )
Using two nested for
-loops is unsurprisingly very slow.
Is there a way to improve the speed? Maybe by doing some NumPy array magic or parallelization.
I've done some initial testing and it seems as though the vast bulk of your time is spent generating the actual interpolation functions. It doesn't seem like just vectorization will speed it up a ton, but it does make it easy to parallelize (which does yield speed increases). Here's an example.
import numpy as np
from scipy import interpolate
import timeit
import multiprocessing
def myfunc(arg):
x, y = arg
return interpolate.PchipInterpolator(x,
y,
extrapolate=True)
p = multiprocessing.Pool(processes=8)
axes_have = np.ones((2, 72, 2001))
axes_have *= np.linspace(0, 100, 2001)[None, None, :]
axes_have += np.linspace(-0.3, 0.3, 144).reshape((2, 72))[:, :, None]
arr = np.sin(axes_have)
arr *= np.random.random((2, 72))[:, :, None]
axis_want = np.linspace(0, 100, 201)
arr_ip = np.zeros((2, 72, 201))
s_time1 = timeit.default_timer()
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
ip_func = interpolate.PchipInterpolator(axes_have[i, j, :],
arr[i, j, :],
extrapolate=True)
arr_ip[i, j, :] = ip_func(axis_want)
elapsed1 = timeit.default_timer() - s_time1
s_time2 = timeit.default_timer()
flatten_have = [y for x in axes_have for y in x]
flatten_arr = [y for x in arr for y in x]
interp_time = timeit.default_timer()
funcs = p.map(myfunc, zip(flatten_have, flatten_arr))
interp_elapsed = timeit.default_timer() - interp_time
arr_ip = np.asarray([func(axis_want) for func in funcs]).reshape(2, 72, 201)
elapsed2 = timeit.default_timer() - s_time2
print("Elapsed 1: {}".format(elapsed1))
print("Elapsed 2: {}".format(elapsed2))
print("Elapsed interp: {}".format(interp_elapsed))
Typical Results (note that the vectorized implementation is pretty much exactly the same speed without the parallelization and that I have 2 cores, so your runtime should be roughly (original time / # of cores)):
Elapsed 1: 10.4025919437
Elapsed 2: 5.11409401894
Elapsed interp: 5.10987687111
Don't get me wrong, there may be an algorithmic way to do this more quickly, but this seems like the easiest way to yield an immediate speedup, since the problem is embarrassingly parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With