I've written a routine that interpolates point data onto a regular grid. However, I find that scipy
's implementation of nearest neighbor interpolation performs almost twice as slow as the radial basis function I'm using for linear interpolation (scipy.interpolate.Rbf
)
Relevant code includes how the interpolators are constructed
if interpolation_mode == 'linear':
interpolator = scipy.interpolate.Rbf(
point_array[:, 0], point_array[:, 1], value_array,
function='linear', smooth=.01)
elif interpolation_mode == 'nearest':
interpolator = scipy.interpolate.NearestNDInterpolator(
point_array, value_array)
And when the interpolation is called
result = interpolator(col_coords.ravel(), row_coords.ravel())
The sample I'm running on has 27 input interpolant value points and I'm interpolating across nearly a 20000 X 20000 grid. (I'm doing this in memory block sizes so I'm not exploding the computer btw.)
Below are the result of two cProfile
s I've run on the relevant code. Note that the nearest neighbor scheme runs in 406 seconds while the linear scheme runs in 256 seconds. The nearest scheme is dominated by calls to scipy's kdTree
, which seems reasonable, except the rbf
outperforms it by a significant amount of time. Any ideas why or what I could do to make my nearest scheme run faster than linear?
Linear run:
25362 function calls in 225.886 seconds
Ordered by: internal time
List reduced from 328 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
253 169.302 0.669 207.516 0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)
258 38.211 0.148 38.211 0.148 {method 'reduce' of 'numpy.ufunc' objects}
252 6.069 0.024 6.069 0.024 {numpy.core._dotblas.dot}
1 5.077 5.077 225.332 225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
252 1.849 0.007 2.137 0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)
507 1.419 0.003 1.419 0.003 {method 'flatten' of 'numpy.ndarray' objects}
1268 1.368 0.001 1.368 0.001 {numpy.core.multiarray.array}
252 1.018 0.004 1.018 0.004 {_gdal_array.BandRasterIONumPy}
1 0.533 0.533 225.886 225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)
252 0.336 0.001 216.716 0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)
Nearest neighbor run:
27539 function calls in 405.624 seconds
Ordered by: internal time
List reduced from 309 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
252 397.806 1.579 397.822 1.579 {method 'query' of 'ckdtree.cKDTree' objects}
252 1.875 0.007 1.881 0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}
252 1.831 0.007 2.101 0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)
252 1.034 0.004 400.739 1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)
1 1.009 1.009 405.030 405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
252 0.719 0.003 0.719 0.003 {_gdal_array.BandRasterIONumPy}
1 0.509 0.509 405.624 405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)
252 0.261 0.001 0.261 0.001 {numpy.core.multiarray.copyto}
27 0.125 0.005 0.125 0.005 {_ogr.Layer_CreateFeature}
1 0.116 0.116 0.254 0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)
For reference, I'm also including the visual result of these two test cases.
Nearest
Linear
Running the example in the griddata
doc:
In [47]: def func(x, y):
return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
....:
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]
So we have 1000 scattered points, and will interpolate at 20,000.
In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
method='nearest')
10 loops, best of 3: 83.6 ms per loop
In [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
method='linear')
1 loops, best of 3: 24.6 ms per loop
In [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
method='cubic')
10 loops, best of 3: 42.7 ms per loop
and for the 2 stage interpolators:
In [55]: %%timeit
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())
....:
1 loops, best of 3: 3.89 s per loop
In [56]: %%timeit
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())
....:
10 loops, best of 3: 82.6 ms per loop
In [57]: %%timeit
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())
....
10 loops, best of 3: 25.1 ms per loop
griddata
is actually a 1 step cover call for these last two versions.
griddata
describes its methods as:
nearest
return the value at the data point closest to the point of
interpolation. See NearestNDInterpolator for more details.
Uses scipy.spatial.cKDTree
linear
tesselate the input point set to n-dimensional simplices,
and interpolate linearly on each simplex.
LinearNDInterpolator details are:
The interpolant is constructed by triangulating the
input data with Qhull [R37], and on each triangle
performing linear barycentric interpolation.
cubic (2-D)
return the value determined from a piecewise cubic, continuously
differentiable (C1), and approximately curvature-minimizing
polynomial surface. See CloughTocher2DInterpolator for more details.
Further tests on the 2 stage versions shows that setting up the nearest cKTtree is very fast; most of the time is spent in the 2nd interpolation state.
On the other hand, setting up the triangulated surface takes longer than the linear interpolation.
I don't know enough of the Rbf method to say why that is so much slower. The underlying methods are so different that intuitions developed with simple manual methods of interpolation don't mean much.
Your example starts with fewer scattered points, and interpolates on a much finer grid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With