The simplest solution is to use something which can be vectorized. eg. ynew = function(xnew);simps(ynew,xnew) This is much faster, but depending on the inputs less accurate. Another possibility which is also a lot faster and gives the same results is to implement a low level callable.
Interpolate a 1-D function. x and y are arrays of values used to approximate some function f: y = f(x). This class returns a function whose call method uses interpolation to find the value of new points.
Interpolation is a technique of constructing data points between given data points. The scipy. interpolate is a module in Python SciPy consisting of classes, spline functions, and univariate and multivariate interpolation classes. Interpolation is done in many ways some of them are : 1-D Interpolation.
Let's say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minute. I have two kinds of interpolation I'd like to perform. And I'd like to be able to perform each quickly.
So let's set up some data:
import numpy as np from scipy.interpolate import interp1d import pandas as pd import seaborn as sns np.random.seed(0) N, sigma = 1000., 5 basetemps = 70 + (np.random.randn(N) * sigma) midtemps = 50 + (np.random.randn(N) * sigma) toptemps = 40 + (np.random.randn(N) * sigma) alltemps = np.array([basetemps, midtemps, toptemps]).T # note transpose! trend = np.sin(4 / N * np.arange(N)) * 30 trend = trend[:, np.newaxis] altitudes = np.array([500, 1500, 4000]).astype(float) finaltemps = pd.DataFrame(alltemps + trend, columns=altitudes) finaltemps.index.names, finaltemps.columns.names = ['Time'], ['Altitude'] finaltemps.plot()
Great, so our temperatures look like this:
I think this one is pretty straightforward. Say I want to get the temperature at an altitude of 1,000 for each time. I can just use built in scipy
interpolation methods:
interping_function = interp1d(altitudes, finaltemps.values) interped_to_1000 = interping_function(1000) fig, ax = plt.subplots(1, 1, figsize=(8, 5)) finaltemps.plot(ax=ax, alpha=0.15) ax.plot(interped_to_1000, label='Interped') ax.legend(loc='best', title=finaltemps.columns.name)
This works nicely. And let's see about speed:
%%timeit res = interp1d(altitudes, finaltemps.values)(1000) #-> 1000 loops, best of 3: 207 µs per loop
So now I have a second, related problem. Say I know the altitude of a hiking party as a function of time, and I want to compute the temperature at their (moving) location by linearly interpolating my data through time. In particular, the times at which I know the location of the hiking party are the same times at which I know the temperatures at my weather stations. I can do this without too much effort:
location = np.linspace(altitudes[0], altitudes[-1], N) interped_along_path = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) for i, loc in enumerate(location)]) fig, ax = plt.subplots(1, 1, figsize=(8, 5)) finaltemps.plot(ax=ax, alpha=0.15) ax.plot(interped_along_path, label='Interped') ax.legend(loc='best', title=finaltemps.columns.name)
So this works really nicely, but its important to note that the key line above is using list comprehension to hide an enormous amount of work. In the previous case, scipy
is creating a single interpolation function for us, and evaluating it once on a large amount of data. In this case, scipy
is actually constructing N
individual interpolating functions and evaluating each once on a small amount of data. This feels inherently inefficient. There is a for loop lurking here (in the list comprehension) and moreover, this just feels flabby.
Not surprisingly, this is much slower than the previous case:
%%timeit res = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) for i, loc in enumerate(location)]) #-> 10 loops, best of 3: 145 ms per loop
So the second example runs 1,000 slower than the first. I.e. consistent with the idea that the heavy lifting is the "make a linear interpolation function" step...which is happening 1,000 times in the second example but only once in the first.
So, the question: is there a better way to approach the second problem? For example, is there a good way to set it up with 2-dimensinoal interpolation (which could perhaps handle the case where the times at which the hiking party locations are known are not the times at which the temperatures have been sampled)? Or is there a particularly slick way to handle things here where the times do line up? Or other?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With