Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpolate time series, select y value from x

I have been searching for an answer to this for a while, and have gotten close but keep running into errors. There are a lot of similar questions that almost answer this, but I haven't been able to solve it. Any help or a point in the right direction is appreciated.

I have a graph showing temperature as a mostly non-linear function of depth, with the x and y values drawn from a pandas data frame.

import matplotlib.pyplot as plt

x = (22.81,  22.81,  22.78,  22.71,  22.55,  22.54,  22.51,  22.37)
y = (5, 16, 23, 34, 61, 68, 77, 86)

#Plot details
plt.figure(figsize=(10,7)), plt.plot(style='.-')
plt.title("Temperature as a Function of Depth")
plt.xlabel("Temperature"), plt.ylabel("Depth")
plt.gca().invert_yaxis()
plt.plot(x,y, linestyle='--', marker='o', color='b')

Which gives me an image somewhat like this one (note the flipped y axis since I'm talking about depth):

enter image description here

I would like to find the y value at a specific x value of 22.61, which is not one of the original temperature values in the dataset. I've tried the following steps:

np.interp(22.61, x1, y1)

Which gives me a value that I know to be incorrect, as does

s = pd.Series([5,16,23,34,np.nan,61,68,77,86], index=[22.81,22.81,22.78,22.71,22.61,22.55,22.54,22.51,22.37])
s.interpolate(method='index')

where I am trying to just set up a frame and force the interpolation. I also tried

line = plt.plot(x,y)
xvalues = line[0].get_xdata()
yvalues = line[0].get_ydata()
idx = np.where(xvalues==xvalues[3]) ## 3 is the position
yvalues[idx]

but this returns y values for a specific, already-listed x value, rather than an interpolated one.

I hope this is clear enough. I'm brand new to data science, and to stackoverflow, so if I need to rephrase the question please let me know.

like image 873
R-Lionheart Avatar asked May 29 '18 11:05

R-Lionheart


2 Answers

You may indeed use the numpy.interp function. As the documentation states

The x-coordinates of the data points, must be increasing [...]

So you need to sort the arrays on the x array, before using this function.

# Sort arrays
xs = np.sort(x)
ys = np.array(y)[np.argsort(x)]

# x coordinate
x0 = 22.61
# interpolated y coordinate
y0 = np.interp(x0, xs, ys)


Complete Code:
import numpy as np
import matplotlib.pyplot as plt

x = (22.81,  22.81,  22.78,  22.71,  22.55,  22.54,  22.51,  22.37)
y = (5, 16, 23, 34, 61, 68, 77, 86)

# Sort arrays
xs = np.sort(x)
ys = np.array(y)[np.argsort(x)]

# x coordinate
x0 = 22.61
# interpolated y coordinate
y0 = np.interp(x0, xs, ys)

#Plot details
plt.figure(figsize=(10,7)), plt.plot(style='.-')
plt.title("Temperature as a Function of Depth")
plt.xlabel("Temperature"), plt.ylabel("Depth")
plt.gca().invert_yaxis()
plt.plot(x,y, linestyle='--', marker='o', color='b')
plt.plot(x0,y0, marker="o", color="C3")

enter image description here

like image 186
ImportanceOfBeingErnest Avatar answered Sep 18 '22 02:09

ImportanceOfBeingErnest


I think Scipy provides a more intuitive API to solve this problem. You can then easily continue working with your data in Pandas.

from scipy.interpolate import interp1d
x = np.array((22.81,  22.81,  22.78,  22.71,  22.55,  22.54,  22.51,  22.37))
y = np.array((5, 16, 23, 34, 61, 68, 77, 86))

# fit the interpolation on the original index and values
f = interp1d(x, y, kind='linear')

# perform interpolation for values across the full desired index
f([22.81,22.81,22.78,22.71,22.61,22.55,22.54,22.51,22.37])

Output:

array([16.   , 16.   , 23.   , 34.   , 50.875, 61.   , 68.   , 77.   ,
   86.   ])

You can choose multiple other non-linear interpolations too (quadratic, cubic and so on). Check out the comprehensive interpolation documentation for more detail.

[Edit]: You will need to sort your arrays on the x axis as @ImportanceOfBeingErnest adds.

like image 31
twolffpiggott Avatar answered Sep 17 '22 02:09

twolffpiggott