Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data storage to ease data interpolation in Python

I have 20+ tables similar to table 1. Where all letters represent actual values.

Table 1:
$ / cars |<1 | 2 | 3 | 4+
<10,000  | a | b | c | d
20,000   | e | f | g | h
30,000   | i | j | k | l
40,000+  | m | n | o | p

A user input could be for example, (2.4, 24594) which is a value between f, g, j, and k. My Python function definition and pseudo-code to calculate this bilinear interpolation is as follows.

def bilinear_interpolation( x_in, y_in, x_high, x_low, y_low, y_high ):
   # interpolate with respect to x
   # interpolate with respect to y
   # return result

How should I store the data from table 1 (a file, a dict, tuple of tuples, or dict of lists), so I can perform the bilinear interpolation most efficiently and correctly?

like image 694
dassouki Avatar asked May 24 '09 02:05

dassouki


People also ask

How do you interpolate data in Python?

The function interp1d() is used to interpolate a distribution with 1 variable. It takes x and y points and returns a callable function that can be called with new x and returns corresponding y .

What is interpolation in Python?

Interpolation is a technique in Python used to estimate unknown data points between two known data points. Interpolation is mostly used to impute missing values in the dataframe or series while preprocessing data.

Which interpolation method is best for time series?

Linear interpolation works the best when we have many points.

What is interp1d in Python?

The interp1d() function of scipy. interpolate package is used to interpolate a 1-D function. It takes arrays of values such as x and y to approximate some function y = f(x) and then uses interpolation to find the value of new points.


2 Answers

If you want the most computationally efficient solution I can think of and are not restricted to the standard library, then I would recommend scipy/numpy. First, store the a..p array as a 2D numpy array and then both the $4k-10k and 1-4 arrays as 1D numpy arrays. Use scipy's interpolate.interp1d if both 1D arrays are monotonically increasing, or interpolate.bsplrep (bivariate spline representation) if not and your example arrays are as small as your example. Or simply write your own and not bother with scipy. Here are some examples:

# this follows your pseudocode most closely, but it is *not*
# the most efficient since it creates the interpolation 
# functions on each call to bilinterp
from scipy import interpolate
import numpy
data = numpy.arange(0., 16.).reshape((4,4))  #2D array
prices = numpy.arange(10000., 50000., 10000.)
cars = numpy.arange(1., 5.)
def bilinterp(price,car):
    return interpolate.interp1d(cars, interpolate.interp1d(prices, a)(price))(car)
print bilinterp(22000,2)

The last time I checked (a version of scipy from 2007-ish) it only worked for monotonically increasing arrays of x and y)

for small arrays like this 4x4 array, I think you want to use this: http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplrep.html#scipy.interpolate.bisplrep which will handle more interestingly shaped surfaces and the function only needs to be created once. For larger arrays, I think you want this (not sure if this has the same restrictions as interp1d): http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html#scipy.interpolate.interp2d but they both require a different and more verbose data structure than the three arrays in the example above.

like image 168
Paul Avatar answered Sep 25 '22 01:09

Paul


I'd keep a sorted list of the first column, and use the bisect module in the standard library to look for the values -- it's the best way to get the immediately-lower and immediately-higher indices. Every other column can be kept as another list parallel to this one.

like image 37
Alex Martelli Avatar answered Sep 22 '22 01:09

Alex Martelli