Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does scipy linear interpolation run faster than nearest neighbor interpolation?

I've written a routine that interpolates point data onto a regular grid. However, I find that scipy's implementation of nearest neighbor interpolation performs almost twice as slow as the radial basis function I'm using for linear interpolation (scipy.interpolate.Rbf)

Relevant code includes how the interpolators are constructed

if interpolation_mode == 'linear':
    interpolator = scipy.interpolate.Rbf(
        point_array[:, 0], point_array[:, 1], value_array,
        function='linear', smooth=.01)
elif interpolation_mode == 'nearest':
    interpolator = scipy.interpolate.NearestNDInterpolator(
        point_array, value_array)

And when the interpolation is called

result = interpolator(col_coords.ravel(), row_coords.ravel())

The sample I'm running on has 27 input interpolant value points and I'm interpolating across nearly a 20000 X 20000 grid. (I'm doing this in memory block sizes so I'm not exploding the computer btw.)

Below are the result of two cProfiles I've run on the relevant code. Note that the nearest neighbor scheme runs in 406 seconds while the linear scheme runs in 256 seconds. The nearest scheme is dominated by calls to scipy's kdTree, which seems reasonable, except the rbf outperforms it by a significant amount of time. Any ideas why or what I could do to make my nearest scheme run faster than linear?

Linear run:

     25362 function calls in 225.886 seconds

   Ordered by: internal time
   List reduced from 328 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      253  169.302    0.669  207.516    0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)
      258   38.211    0.148   38.211    0.148 {method 'reduce' of 'numpy.ufunc' objects}
      252    6.069    0.024    6.069    0.024 {numpy.core._dotblas.dot}
        1    5.077    5.077  225.332  225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    1.849    0.007    2.137    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)
      507    1.419    0.003    1.419    0.003 {method 'flatten' of 'numpy.ndarray' objects}
     1268    1.368    0.001    1.368    0.001 {numpy.core.multiarray.array}
      252    1.018    0.004    1.018    0.004 {_gdal_array.BandRasterIONumPy}
        1    0.533    0.533  225.886  225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.336    0.001  216.716    0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)

Nearest neighbor run:

     27539 function calls in 405.624 seconds

   Ordered by: internal time
   List reduced from 309 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      252  397.806    1.579  397.822    1.579 {method 'query' of 'ckdtree.cKDTree' objects}
      252    1.875    0.007    1.881    0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}
      252    1.831    0.007    2.101    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)
      252    1.034    0.004  400.739    1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)
        1    1.009    1.009  405.030  405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    0.719    0.003    0.719    0.003 {_gdal_array.BandRasterIONumPy}
        1    0.509    0.509  405.624  405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.261    0.001    0.261    0.001 {numpy.core.multiarray.copyto}
       27    0.125    0.005    0.125    0.005 {_ogr.Layer_CreateFeature}
        1    0.116    0.116    0.254    0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)

For reference, I'm also including the visual result of these two test cases.

Nearest

Nearest

Linear

Linear

like image 487
Rich Avatar asked Sep 29 '15 20:09

Rich


1 Answers

Running the example in the griddata doc:

In [47]: def func(x, y):
          return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
   ....: 
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]

So we have 1000 scattered points, and will interpolate at 20,000.

In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='nearest')
10 loops, best of 3: 83.6 ms per loop

In [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='linear')
1 loops, best of 3: 24.6 ms per loop

In [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y), 
    method='cubic')
10 loops, best of 3: 42.7 ms per loop

and for the 2 stage interpolators:

In [55]: %%timeit 
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())
   ....: 
1 loops, best of 3: 3.89 s per loop

In [56]: %%timeit 
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())
   ....: 
10 loops, best of 3: 82.6 ms per loop

In [57]: %%timeit 
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())
 ....
10 loops, best of 3: 25.1 ms per loop

griddata is actually a 1 step cover call for these last two versions.

griddata describes its methods as:

nearest
return the value at the data point closest to the point of
   interpolation. See NearestNDInterpolator for more details.
   Uses scipy.spatial.cKDTree

linear
tesselate the input point set to n-dimensional simplices, 
   and interpolate linearly on each simplex. 
   LinearNDInterpolator details are:
      The interpolant is constructed by triangulating the 
      input data with Qhull [R37], and on each triangle 
      performing linear barycentric interpolation.

cubic (2-D)
return the value determined from a piecewise cubic, continuously 
   differentiable (C1), and approximately curvature-minimizing 
   polynomial surface. See CloughTocher2DInterpolator for more details.

Further tests on the 2 stage versions shows that setting up the nearest cKTtree is very fast; most of the time is spent in the 2nd interpolation state.

On the other hand, setting up the triangulated surface takes longer than the linear interpolation.

I don't know enough of the Rbf method to say why that is so much slower. The underlying methods are so different that intuitions developed with simple manual methods of interpolation don't mean much.

Your example starts with fewer scattered points, and interpolates on a much finer grid.

like image 101
hpaulj Avatar answered Oct 17 '22 14:10

hpaulj