Inverse Distance Weighted (IDW) Interpolation with Python

Tags:

The Question: What is the best way to calculate inverse distance weighted (IDW) interpolation in Python, for point locations?

Some Background: Currently I'm using RPy2 to interface with R and its gstat module. Unfortunately, the gstat module conflicts with arcgisscripting which I got around by running RPy2 based analysis in a separate process. Even if this issue is resolved in a recent/future release, and efficiency can be improved, I'd still like to remove my dependency on installing R.

The gstat website does provide a stand alone executable, which is easier to package with my python script, but I still hope for a Python solution which doesn't require multiple writes to disk and launching external processes. The number of calls to the interpolation function, of separate sets of points and values, can approach 20,000 in the processing I'm performing.

I specifically need to interpolate for points, so using the IDW function in ArcGIS to generate rasters sounds even worse than using R, in terms of performance.....unless there is a way to efficiently mask out only the points I need. Even with this modification, I wouldn't expect performance to be all that great. I will look into this option as another alternative. UPDATE: The problem here is you are tied to the cell size you are using. If you reduce the cell-size to get better accuracy, processing takes a long time. You also need to follow up by extracting by points.....over all an ugly method if you want values for specific points.

I have looked at the scipy documentation, but it doesn't look like there is a straight forward way to calculate IDW.

I'm thinking of rolling my own implementation, possibly using some of the scipy functionality to locate the closest points and calculate distances.

Am I missing something obvious? Is there a python module I haven't seen that does exactly what I want? Is creating my own implementation with the aid of scipy a wise choice?

382

asked Jun 23 '10 19:06

Michael Allan Jackson

2 Answers

changed 20 Oct: this class Invdisttree combines inverse-distance weighting and scipy.spatial.KDTree.
Forget the original brute-force answer; this is imho the method of choice for scattered-data interpolation.

""" invdisttree.py: inverse-distance-weighted interpolation using KDTree     fast, solid, local """ from __future__ import division import numpy as np from scipy.spatial import cKDTree as KDTree     # http://docs.scipy.org/doc/scipy/reference/spatial.html  __date__ = "2010-11-09 Nov"  # weights, doc  #............................................................................... class Invdisttree:     """ inverse-distance-weighted interpolation using KDTree: invdisttree = Invdisttree( X, z )  -- data points, values interpol = invdisttree( q, nnear=3, eps=0, p=1, weights=None, stat=0 )     interpolates z from the 3 points nearest each query point q;     For example, interpol[ a query point q ]     finds the 3 data points nearest q, at distances d1 d2 d3     and returns the IDW average of the values z1 z2 z3         (z1/d1 + z2/d2 + z3/d3)         / (1/d1 + 1/d2 + 1/d3)         = .55 z1 + .27 z2 + .18 z3  for distances 1 2 3      q may be one point, or a batch of points.     eps: approximate nearest, dist <= (1 + eps) * true nearest     p: use 1 / distance**p     weights: optional multipliers for 1 / distance**p, of the same shape as q     stat: accumulate wsum, wn for average weights  How many nearest neighbors should one take ? a) start with 8 11 14 .. 28 in 2d 3d 4d .. 10d; see Wendel's formula b) make 3 runs with nnear= e.g. 6 8 10, and look at the results --     |interpol 6 - interpol 8| etc., or |f - interpol*| if you have f(q).     I find that runtimes don't increase much at all with nnear -- ymmv.  p=1, p=2 ?     p=2 weights nearer points more, farther points less.     In 2d, the circles around query points have areas ~ distance**2,     so p=2 is inverse-area weighting. For example,         (z1/area1 + z2/area2 + z3/area3)         / (1/area1 + 1/area2 + 1/area3)         = .74 z1 + .18 z2 + .08 z3  for distances 1 2 3     Similarly, in 3d, p=3 is inverse-volume weighting.  Scaling:     if different X coordinates measure different things, Euclidean distance     can be way off.  For example, if X0 is in the range 0 to 1     but X1 0 to 1000, the X1 distances will swamp X0;     rescale the data, i.e. make X0.std() ~= X1.std() .  A nice property of IDW is that it's scale-free around query points: if I have values z1 z2 z3 from 3 points at distances d1 d2 d3, the IDW average     (z1/d1 + z2/d2 + z3/d3)     / (1/d1 + 1/d2 + 1/d3) is the same for distances 1 2 3, or 10 20 30 -- only the ratios matter. In contrast, the commonly-used Gaussian kernel exp( - (distance/h)**2 ) is exceedingly sensitive to distance and to h.      """ # anykernel( dj / av dj ) is also scale-free # error analysis, |f(x) - idw(x)| ? todo: regular grid, nnear ndim+1, 2*ndim      def __init__( self, X, z, leafsize=10, stat=0 ):         assert len(X) == len(z), "len(X) %d != len(z) %d" % (len(X), len(z))         self.tree = KDTree( X, leafsize=leafsize )  # build the tree         self.z = z         self.stat = stat         self.wn = 0         self.wsum = None;      def __call__( self, q, nnear=6, eps=0, p=1, weights=None ):             # nnear nearest neighbours of each query point --         q = np.asarray(q)         qdim = q.ndim         if qdim == 1:             q = np.array([q])         if self.wsum is None:             self.wsum = np.zeros(nnear)          self.distances, self.ix = self.tree.query( q, k=nnear, eps=eps )         interpol = np.zeros( (len(self.distances),) + np.shape(self.z[0]) )         jinterpol = 0         for dist, ix in zip( self.distances, self.ix ):             if nnear == 1:                 wz = self.z[ix]             elif dist[0] < 1e-10:                 wz = self.z[ix[0]]             else:  # weight z s by 1/dist --                 w = 1 / dist**p                 if weights is not None:                     w *= weights[ix]  # >= 0                 w /= np.sum(w)                 wz = np.dot( w, self.z[ix] )                 if self.stat:                     self.wn += 1                     self.wsum += w             interpol[jinterpol] = wz             jinterpol += 1         return interpol if qdim > 1  else interpol[0]  #............................................................................... if __name__ == "__main__":     import sys      N = 10000     Ndim = 2     Nask = N  # N Nask 1e5: 24 sec 2d, 27 sec 3d on mac g4 ppc     Nnear = 8  # 8 2d, 11 3d => 5 % chance one-sided -- Wendel, mathoverflow.com     leafsize = 10     eps = .1  # approximate nearest, dist <= (1 + eps) * true nearest     p = 1  # weights ~ 1 / distance**p     cycle = .25     seed = 1      exec "\n".join( sys.argv[1:] )  # python this.py N= ...     np.random.seed(seed )     np.set_printoptions( 3, threshold=100, suppress=True )  # .3f      print "\nInvdisttree:  N %d  Ndim %d  Nask %d  Nnear %d  leafsize %d  eps %.2g  p %.2g" % (         N, Ndim, Nask, Nnear, leafsize, eps, p)      def terrain(x):         """ ~ rolling hills """         return np.sin( (2*np.pi / cycle) * np.mean( x, axis=-1 ))      known = np.random.uniform( size=(N,Ndim) ) ** .5  # 1/(p+1): density x^p     z = terrain( known )     ask = np.random.uniform( size=(Nask,Ndim) )  #...............................................................................     invdisttree = Invdisttree( known, z, leafsize=leafsize, stat=1 )     interpol = invdisttree( ask, nnear=Nnear, eps=eps, p=p )      print "average distances to nearest points: %s" % \         np.mean( invdisttree.distances, axis=0 )     print "average weights: %s" % (invdisttree.wsum / invdisttree.wn)         # see Wikipedia Zipf's law     err = np.abs( terrain(ask) - interpol )     print "average |terrain() - interpolated|: %.2g" % np.mean(err)      # print "interpolate a single point: %.2g" % \     #     invdisttree( known[0], nnear=Nnear, eps=eps )

114

answered Sep 17 '22 15:09

denis

Edit: @Denis is right, a linear Rbf (e.g. scipy.interpolate.Rbf with "function='linear'") isn't the same as IDW...

(Note, all of these will use excessive amounts of memory if you're using a large number of points!)

Here's a simple exampe of IDW:

def simple_idw(x, y, z, xi, yi):     dist = distance_matrix(x,y, xi,yi)      # In IDW, weights are 1 / distance     weights = 1.0 / dist      # Make weights sum to one     weights /= weights.sum(axis=0)      # Multiply the weights for each interpolated point by all observed Z-values     zi = np.dot(weights.T, z)     return zi

Whereas, here's what a linear Rbf would be:

def linear_rbf(x, y, z, xi, yi):     dist = distance_matrix(x,y, xi,yi)      # Mutual pariwise distances between observations     internal_dist = distance_matrix(x,y, x,y)      # Now solve for the weights such that mistfit at the observations is minimized     weights = np.linalg.solve(internal_dist, z)      # Multiply the weights for each interpolated point by the distances     zi =  np.dot(dist.T, weights)     return zi

(Using the distance_matrix function here:)

def distance_matrix(x0, y0, x1, y1):     obs = np.vstack((x0, y0)).T     interp = np.vstack((x1, y1)).T      # Make a distance matrix between pairwise observations     # Note: from <http://stackoverflow.com/questions/1871536>     # (Yay for ufuncs!)     d0 = np.subtract.outer(obs[:,0], interp[:,0])     d1 = np.subtract.outer(obs[:,1], interp[:,1])      return np.hypot(d0, d1)

Putting it all together into a nice copy-paste example yields some quick comparison plots: Homemade IDW example plot
_{(source: jkington at www.geology.wisc.edu)}
Homemade linear RBF example plot
_{(source: jkington at www.geology.wisc.edu)}
Scipy's linear RBF example plot
_{(source: jkington at www.geology.wisc.edu)}

import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import Rbf  def main():     # Setup: Generate data...     n = 10     nx, ny = 50, 50     x, y, z = map(np.random.random, [n, n, n])     xi = np.linspace(x.min(), x.max(), nx)     yi = np.linspace(y.min(), y.max(), ny)     xi, yi = np.meshgrid(xi, yi)     xi, yi = xi.flatten(), yi.flatten()      # Calculate IDW     grid1 = simple_idw(x,y,z,xi,yi)     grid1 = grid1.reshape((ny, nx))      # Calculate scipy's RBF     grid2 = scipy_idw(x,y,z,xi,yi)     grid2 = grid2.reshape((ny, nx))      grid3 = linear_rbf(x,y,z,xi,yi)     print grid3.shape     grid3 = grid3.reshape((ny, nx))       # Comparisons...     plot(x,y,z,grid1)     plt.title('Homemade IDW')      plot(x,y,z,grid2)     plt.title("Scipy's Rbf with function=linear")      plot(x,y,z,grid3)     plt.title('Homemade linear Rbf')      plt.show()  def simple_idw(x, y, z, xi, yi):     dist = distance_matrix(x,y, xi,yi)      # In IDW, weights are 1 / distance     weights = 1.0 / dist      # Make weights sum to one     weights /= weights.sum(axis=0)      # Multiply the weights for each interpolated point by all observed Z-values     zi = np.dot(weights.T, z)     return zi  def linear_rbf(x, y, z, xi, yi):     dist = distance_matrix(x,y, xi,yi)      # Mutual pariwise distances between observations     internal_dist = distance_matrix(x,y, x,y)      # Now solve for the weights such that mistfit at the observations is minimized     weights = np.linalg.solve(internal_dist, z)      # Multiply the weights for each interpolated point by the distances     zi =  np.dot(dist.T, weights)     return zi   def scipy_idw(x, y, z, xi, yi):     interp = Rbf(x, y, z, function='linear')     return interp(xi, yi)  def distance_matrix(x0, y0, x1, y1):     obs = np.vstack((x0, y0)).T     interp = np.vstack((x1, y1)).T      # Make a distance matrix between pairwise observations     # Note: from <http://stackoverflow.com/questions/1871536>     # (Yay for ufuncs!)     d0 = np.subtract.outer(obs[:,0], interp[:,0])     d1 = np.subtract.outer(obs[:,1], interp[:,1])      return np.hypot(d0, d1)   def plot(x,y,z,grid):     plt.figure()     plt.imshow(grid, extent=(x.min(), x.max(), y.max(), y.min()))     plt.hold(True)     plt.scatter(x,y,c=z)     plt.colorbar()  if __name__ == '__main__':     main()

answered Sep 19 '22 15:09

Joe Kington

Related questions
                            
                                Beautiful Soup findAll doesn't find them all
                            
                                How can I decorate an instance method with a decorator class?
                            
                                Peeking in a heap in python
                            
                                Find which python modules are being imported
                            
                                how to check DEBUG true/false in django template - exactly in layout.html [duplicate]
                            
                                Beginner Python Practice? [closed]
                            
                                How to iterate Queue.Queue items in Python?
                            
                                How do you call an instance of a class in Python?
                            
                                Remove first x number of characters from each row in a column of a Python dataframe
                            
                                What is the correct way to change image channel ordering between channels first and channels last?
                            
                                Facebook JSON badly encoded
                            
                                Can I get an item from a PriorityQueue without removing it yet?
                            
                                Python argument parser list of list or tuple of tuples
                            
                                which is faster for load: pickle or hdf5 in python
                            
                                Importing the numpy c-extensions failed
                            
                                Qt programming: More productive in Python or C++?
                            
                                Proper way to install pip on Ubuntu
                            
                                PCA For categorical features?
                            
                                Why does Python compile modules but not the script being run?
                            
                                Mass string replace in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Inverse Distance Weighted (IDW) Interpolation with Python

Tags:

python

numpy

scipy

interpolation

spatial

Michael Allan Jackson

People also ask

2 Answers

denis

Joe Kington

Recent Activity

Donate For Us