I have a 256 x 256 x 32 grid of regularly spaced points ranging over x, y, and z and with an associated variable "a". I also have a group of randomly scattered points in a more confined x, y, z space, with an associated variable "b". What I essentially want to do is interpolate and extrapolate my random data to a regularly spaced grid that matches the "a" cube, as shown below:
I have used scipy's griddata so far to achieve the interpolation, which seems to work fine, but it cannot handle the extrapolation (as far as I know) and the output sharply truncates to 'nan' values. Whilst researching this problem I came across a couple of people using griddata a second time with 'nearest' as the interpolation method to fill in the 'nan' values. I tried this but the results don't seem reliable. More appropriate looking results are obtained if I use a fill_Value with 'linear' mode, but at the moment it's more a fudge because fill_Value has to be a constant.
I noticed that MATLAB has a ScatteredInterpolant class which seems to do what I want, but I am unable to find an equivalent class in Python, nor figure out how to implement such a routine efficiently in 3D. Any help is greatly appreciated.
The code I am using for the interpolation is below:
x, y, z, b = np.loadtxt(scatteredfile, unpack = True)
# Create cube to match aCube dimensions
xi = np.linspace(-xmax_aCube, xmax_aCube, 256)
yi = np.linspace(-ymax_aCube, ymax_aCube, 256)
zi = np.linspace(zmin_aCube, zmax_aCube, 32)
# Interpolate scattered points
X, Y, Z = np.meshgrid(xi, yi, zi)
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear')
Extrapolation refers to estimating an unknown value based on extending a known sequence of values or facts. To extrapolate is to infer something not explicitly stated from existing information. Interpolation is the act of estimating a value within two known values that exist within a sequence of values.
Use griddedInterpolant to perform interpolation on a 1-D, 2-D, 3-D, or N-D gridded data set. griddedInterpolant returns the interpolant F for the given data set. You can evaluate F at a set of query points, such as (xq,yq) in 2-D, to produce interpolated values vq = F(xq,yq) .
To interpolate means to make guesses about the graph in between the data points that we have collected. To extrapolate, means to makes guesses about the graph before and after the data points we have collected. Looking at a graph, we follow the line to find our guesses.
This discussion applies in any dimensionality. For your 3D case lets talk about computational geometry first, to understand why part of the region gives NaN
from griddata
.
The scattered points in your volume make up a convex hull; a geometric shape with the following properties:
Less formally, the convex hull (which you can compute easily with scipy) is like stretching a balloon over a frame, where the frame corners are the outermost points of your scattered cluster.
At the regular grid location inside the balloon you're surrounded by known points. You can interpolate to these locations. Outside it, you have to extrapolate.
Extrapolation is hard. There's no general rule for how to do it... it's problem-specific. In that region, algorithms like griddata
choose to return NaN
- this is the safest way of informing the scientist that s/he must choose a sensible way of extrapolating.
Let's go through some ways of doing that.
Assign some scalar value outside the hull. In the numpy docs you'll see this is done with: s = mean(b) bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=s)
Cons: This produces a sharp discontinuity in the interpolated field at the hull boundary, heavily biases the mean scalar field value and doesn't respect the functional form of the data.
Assume that at the corners of your domain, you apply some value. This might be the average value of the scalar field associated with your scattered points.
Sorry, this is pseudocode as I don't use numpy at all, but it'll probably be fairly clear
# With a unit cube, and selected scalar value
x, y, z, b = np.loadtxt(scatteredfile, unpack = True)
s = mean(b)
x.append([0 0 0 0 1 1 1 1])
y.append([0 0 1 1 0 0 1 1])
z.append([0 1 0 1 0 1 0 1])
b.append([s s s s s s s s])
# drop in the rest of your code
Cons: This produces a sharp discontinuity in gradient of the interpolated field at the hull boundary, fairly heavily biases the mean scalar field value and doesn't respect the functional form of the data.
For each of the regular NaN points, find the nearest non-NaN and assign that value. This is effective and stable, but crude because your field can end up with patterned features (like stripes or beams radiating out from the hull), often visually unappealing or, worse, unacceptable in terms of data smoothness
Depending on the density of data, you could use the nearest scattered datapoint instead of the nearest non-NaN regular point. This can be done simply by (again, pseudocode):
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=nan)
bCubeNearest = griddata((x, y, z), b, (X, Y, Z), method = 'nearest')
indicesMask = isNan(bCube)
# Use nearest interpolation outside the hull, keeping linear interpolation inside.
bCube(indicesMask) = bCubeNearest(indicesMask)
Using MATLAB's delaunay based approaches will reveal more powerful methods for achieving similar in a one-liner, but numpy looks a bit limited here.
apologies for poor explanation in this section, I've never written the algorithm but I'm sure some research on the natural neighbour technique will get you far
Use a distance weighting function with some parameter D
, which might be similar to, or twice (say) the length of your box. You can adjust. For each NaN location, figure out the distance to each of the scattered points.
# Don't do it this way for anything but small matrices - this is O(NM)
# and it can be done much more effectively (e.g. MATLAB has a quick
# natural weighting option), but for illustrative purposes:
for each NaN point 1:N
for each scattered point 1:M
calculate a basis function using inverse distance from NaN to point, normalised on D, and store in a [1 x M] vector of weights
Multiply weights by the b value, summate and divide by M
You basically want to end up with a function that smoothly goes to the average intensity of B at a distance D away from the hull, but coincides with the hull at the boundary. Away from the boundary it is weighted most strongly on its nearest points.
Pros: nicely stable and reasonably continuous. Because of the weighting, is more resilient to noise at single data points than nearest neighbour.
What do you know about the physics? Assume a functional form that represents what you expect the physics to do, then do a least squares (or some equivalent) fit of that form to the scattered data. Use the function to stabilise the extrapolation.
Some good ideas which can help you construct a function:
Some examples:
b represents intensity of a laser beam passing thru a volume. You expect the entry side to be nominally identical to the outlet, with the other four boundaries of zero intensity. The intensity will have a concentric gaussian profile.
b is one component of a velocity field in an incompressible fluid. The fluid must be divergence free, so any field produced in the NaN zone must also be divergence free so you apply this condition.
b represents temperature in a room. You expect higher temperature at the top, because hot air rises.
b represents lift on an aerofoil, tested over three independent variables. You can look up the lift at stall easily, so know exactly what it'll be in some parts of the space.
Pros/Cons: Get this right and it'll be awesome. Get it wrong, especially with nonlinear functional forms, and it will go very wrong and can lead to very unstable results.
Health warning you can't assume a functional form, get pretty results, then use them to prove that the functional form is correct. That's just bad science. The form needs to be something well behaved and known independent of your data analysis.
If your scatter of points conforms fairly well to a cube shape, one approach could be to use griddata
to interpolate onto a regular grid of data that fits within your point cloud (therefore avoiding nans) and then use this regular grid of values as the input to interpn
which does facilitate linear extrapolation (but requires a regular grid as input).
This way you can use griddata
as before for all the points within the convex hull of your scatter of points and you can use interpn
to estimate the points that are returned as nans.
This is far from perfect, but I think it comes closer to achieving what you are looking for.
Pros:
Cons:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With