I have 20+ tables similar to table 1. Where all letters represent actual values.
Table 1:
$ / cars |<1 | 2 | 3 | 4+
<10,000 | a | b | c | d
20,000 | e | f | g | h
30,000 | i | j | k | l
40,000+ | m | n | o | p
A user input could be for example, (2.4, 24594) which is a value between f, g, j, and k. My Python function definition and pseudo-code to calculate this bilinear interpolation is as follows.
def bilinear_interpolation( x_in, y_in, x_high, x_low, y_low, y_high ):
# interpolate with respect to x
# interpolate with respect to y
# return result
How should I store the data from table 1 (a file, a dict, tuple of tuples, or dict of lists), so I can perform the bilinear interpolation most efficiently and correctly?
The function interp1d() is used to interpolate a distribution with 1 variable. It takes x and y points and returns a callable function that can be called with new x and returns corresponding y .
Interpolation is a technique in Python used to estimate unknown data points between two known data points. Interpolation is mostly used to impute missing values in the dataframe or series while preprocessing data.
Linear interpolation works the best when we have many points.
The interp1d() function of scipy. interpolate package is used to interpolate a 1-D function. It takes arrays of values such as x and y to approximate some function y = f(x) and then uses interpolation to find the value of new points.
If you want the most computationally efficient solution I can think of and are not restricted to the standard library, then I would recommend scipy/numpy. First, store the a..p array as a 2D numpy array and then both the $4k-10k and 1-4 arrays as 1D numpy arrays. Use scipy's interpolate.interp1d if both 1D arrays are monotonically increasing, or interpolate.bsplrep (bivariate spline representation) if not and your example arrays are as small as your example. Or simply write your own and not bother with scipy. Here are some examples:
# this follows your pseudocode most closely, but it is *not*
# the most efficient since it creates the interpolation
# functions on each call to bilinterp
from scipy import interpolate
import numpy
data = numpy.arange(0., 16.).reshape((4,4)) #2D array
prices = numpy.arange(10000., 50000., 10000.)
cars = numpy.arange(1., 5.)
def bilinterp(price,car):
return interpolate.interp1d(cars, interpolate.interp1d(prices, a)(price))(car)
print bilinterp(22000,2)
The last time I checked (a version of scipy from 2007-ish) it only worked for monotonically increasing arrays of x and y)
for small arrays like this 4x4 array, I think you want to use this: http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplrep.html#scipy.interpolate.bisplrep which will handle more interestingly shaped surfaces and the function only needs to be created once. For larger arrays, I think you want this (not sure if this has the same restrictions as interp1d): http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html#scipy.interpolate.interp2d but they both require a different and more verbose data structure than the three arrays in the example above.
I'd keep a sorted list of the first column, and use the bisect
module in the standard library to look for the values -- it's the best way to get the immediately-lower and immediately-higher indices. Every other column can be kept as another list parallel to this one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With