I have a very large array similar to elevation data of the format:
triplets = ((x0, y0, z0),
(x1, y1, z1),
... ,
(xn, yn, zn))
where x, y, z are all floats in metres. You can create suitable test data matching this format with:
x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))
I want to be able to efficiently find the corresponding z-value for a given (x,y) pair. My research so far leads to more questions. Here's what I've got:
Iterate through all of the triplets:
query = (a, b) # where a, b are the x and y coordinates we're looking for
for i in triplets:
if i[0] == query[0] and i[1] == query[1]:
result = i[2]
Drawbacks: slow; a, b
must exist, which is a problem with comparing floats.
Use scipy.spatial.cKDTree
to find nearest:
points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
query = tree.data[idx]
result = triplets[idx, 2]
Drawbacks: returns nearest point rather than interpolating.
Using interp2d
as per comment:
f = interp2d(x, y, z)
result = f(a, b)
Drawbacks: doesn't work on a large dataset. I get OverflowError: Too many data points to interpolate
when run on real data. (My real data is some 11 million points.)
So the question is: is there any straightforward way of doing this that I'm overlooking? Are there ways to reduce the drawbacks of the above?
Method 1: We generally use the == operator to compare two NumPy arrays to generate a new array object. Call ndarray. all() with the new array object as ndarray to return True if the two NumPy arrays are equivalent.
c_ = <numpy.lib.index_tricks.CClass object> Translates slice objects to concatenation along the second axis. This is short-hand for np. r_['-1,2,0', index expression] , which is useful because of its common occurrence.
If you want to interpolate the result, rather than just find the z value for the nearest neighbour, I would consider doing something like the following:
The code might look something like this:
import numpy as np
from scipy.spatial import cKDTree
# some fake (x, y, z) data
XY = np.random.rand(10000, 2) - 0.5
Z = np.exp(-((XY ** 2).sum(1) / 0.1) ** 2)
# construct a k-d tree from the (x, y) coordinates
tree = cKDTree(XY)
# a random point to query
xy = np.random.rand(2) - 0.5
# find the k nearest neighbours (say, k=3)
distances, indices = tree.query(xy, k=3)
# the z-values for the k nearest neighbours of xy
z_vals = Z[indices]
# take the average of these z-values, weighted by 1 / distance from xy
dw_avg = np.average(z_vals, weights=(1. / distances))
It's worth playing around a bit with the value of k, the number of nearest neighbours to take the average of. This is essentially a crude form of kernel density estimation, where the value of k controls the degree of 'smoothness' you're imposing on the underlying distribution of z-values. A larger k results in more smoothness.
Similarly, you might want to play around with how you weight the contributions of points according to their distance from (xi, yi), depending on how you think similarity in z decreases with increasing x, y distance. For example you might want to weight by (1 / distances ** 2)
rather than (1 / distances)
.
In terms of performance, constructing and searching k-d trees are both very efficient. Bear in mind that you only need to construct the tree once for your dataset, and if necessary you can query multiple points at a time by passing (N, 2) arrays to tree.query()
.
Tools for approximate nearest neighbour searches, such as FLANN, might potentially be quicker, but these are usually more helpful in situations when the dimensionality of your data is very high.
I don't understand your cKDTree code, you got the idx
, why do the for loop again? You can get the result just by result = triplets[idx, 2]
.
from scipy.spatial import cKDTree
x = np.arange(20, 40, dtype=np.float64)
y = np.arange(30, 50, dtype=np.float64)
z = np.random.random(20) * 25.0
triplets = np.hstack((x, y, z)).reshape((len(x),3))
a = 30.1
b = 40.5
points = triplets[:,0:2] # drops the z column
tree = cKDTree(points)
idx = tree.query((a, b))[1] # this returns a tuple, we want the index
result = triplets[idx, 2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With