Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast linear interpolation in Numpy / Scipy "along a path"

Tags:

People also ask

How do you make interpolation faster in Python?

The simplest solution is to use something which can be vectorized. eg. ynew = function(xnew);simps(ynew,xnew) This is much faster, but depending on the inputs less accurate. Another possibility which is also a lot faster and gives the same results is to implement a low level callable.

What does Scipy interpolate interp1d return?

Interpolate a 1-D function. x and y are arrays of values used to approximate some function f: y = f(x). This class returns a function whose call method uses interpolation to find the value of new points.

What is Scipy interpolate in Python?

Interpolation is a technique of constructing data points between given data points. The scipy. interpolate is a module in Python SciPy consisting of classes, spline functions, and univariate and multivariate interpolation classes. Interpolation is done in many ways some of them are : 1-D Interpolation.


Let's say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minute. I have two kinds of interpolation I'd like to perform. And I'd like to be able to perform each quickly.

So let's set up some data:

import numpy as np from scipy.interpolate import interp1d import pandas as pd import seaborn as sns  np.random.seed(0) N, sigma = 1000., 5  basetemps = 70 + (np.random.randn(N) * sigma) midtemps = 50 + (np.random.randn(N) * sigma) toptemps = 40 + (np.random.randn(N) * sigma) alltemps = np.array([basetemps, midtemps, toptemps]).T # note transpose! trend = np.sin(4 / N * np.arange(N)) * 30 trend = trend[:, np.newaxis]  altitudes = np.array([500, 1500, 4000]).astype(float)  finaltemps = pd.DataFrame(alltemps + trend, columns=altitudes) finaltemps.index.names, finaltemps.columns.names = ['Time'], ['Altitude'] finaltemps.plot() 

Great, so our temperatures look like this: Raw Temperature Data

Interpolate all times to for the same altitude:

I think this one is pretty straightforward. Say I want to get the temperature at an altitude of 1,000 for each time. I can just use built in scipy interpolation methods:

interping_function = interp1d(altitudes, finaltemps.values) interped_to_1000 = interping_function(1000)  fig, ax = plt.subplots(1, 1, figsize=(8, 5)) finaltemps.plot(ax=ax, alpha=0.15) ax.plot(interped_to_1000, label='Interped') ax.legend(loc='best', title=finaltemps.columns.name) 

Temperature with static interp

This works nicely. And let's see about speed:

%%timeit res = interp1d(altitudes, finaltemps.values)(1000) #-> 1000 loops, best of 3: 207 µs per loop 

Interpolate "along a path":

So now I have a second, related problem. Say I know the altitude of a hiking party as a function of time, and I want to compute the temperature at their (moving) location by linearly interpolating my data through time. In particular, the times at which I know the location of the hiking party are the same times at which I know the temperatures at my weather stations. I can do this without too much effort:

location = np.linspace(altitudes[0], altitudes[-1], N) interped_along_path = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc)                                               for i, loc in enumerate(location)])  fig, ax = plt.subplots(1, 1, figsize=(8, 5)) finaltemps.plot(ax=ax, alpha=0.15) ax.plot(interped_along_path, label='Interped') ax.legend(loc='best', title=finaltemps.columns.name) 

Temperature with moving interp

So this works really nicely, but its important to note that the key line above is using list comprehension to hide an enormous amount of work. In the previous case, scipy is creating a single interpolation function for us, and evaluating it once on a large amount of data. In this case, scipy is actually constructing N individual interpolating functions and evaluating each once on a small amount of data. This feels inherently inefficient. There is a for loop lurking here (in the list comprehension) and moreover, this just feels flabby.

Not surprisingly, this is much slower than the previous case:

%%timeit res = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc)                              for i, loc in enumerate(location)]) #-> 10 loops, best of 3: 145 ms per loop 

So the second example runs 1,000 slower than the first. I.e. consistent with the idea that the heavy lifting is the "make a linear interpolation function" step...which is happening 1,000 times in the second example but only once in the first.

So, the question: is there a better way to approach the second problem? For example, is there a good way to set it up with 2-dimensinoal interpolation (which could perhaps handle the case where the times at which the hiking party locations are known are not the times at which the temperatures have been sampled)? Or is there a particularly slick way to handle things here where the times do line up? Or other?