I have been searching for an answer to this for a while, and have gotten close but keep running into errors. There are a lot of similar questions that almost answer this, but I haven't been able to solve it. Any help or a point in the right direction is appreciated.
I have a graph showing temperature as a mostly non-linear function of depth, with the x and y values drawn from a pandas data frame.
import matplotlib.pyplot as plt
x = (22.81, 22.81, 22.78, 22.71, 22.55, 22.54, 22.51, 22.37)
y = (5, 16, 23, 34, 61, 68, 77, 86)
#Plot details
plt.figure(figsize=(10,7)), plt.plot(style='.-')
plt.title("Temperature as a Function of Depth")
plt.xlabel("Temperature"), plt.ylabel("Depth")
plt.gca().invert_yaxis()
plt.plot(x,y, linestyle='--', marker='o', color='b')
Which gives me an image somewhat like this one (note the flipped y axis since I'm talking about depth):
I would like to find the y value at a specific x value of 22.61, which is not one of the original temperature values in the dataset. I've tried the following steps:
np.interp(22.61, x1, y1)
Which gives me a value that I know to be incorrect, as does
s = pd.Series([5,16,23,34,np.nan,61,68,77,86], index=[22.81,22.81,22.78,22.71,22.61,22.55,22.54,22.51,22.37])
s.interpolate(method='index')
where I am trying to just set up a frame and force the interpolation. I also tried
line = plt.plot(x,y)
xvalues = line[0].get_xdata()
yvalues = line[0].get_ydata()
idx = np.where(xvalues==xvalues[3]) ## 3 is the position
yvalues[idx]
but this returns y values for a specific, already-listed x value, rather than an interpolated one.
I hope this is clear enough. I'm brand new to data science, and to stackoverflow, so if I need to rephrase the question please let me know.
You may indeed use the numpy.interp
function. As the documentation states
The x-coordinates of the data points, must be increasing [...]
So you need to sort the arrays on the x array, before using this function.
# Sort arrays
xs = np.sort(x)
ys = np.array(y)[np.argsort(x)]
# x coordinate
x0 = 22.61
# interpolated y coordinate
y0 = np.interp(x0, xs, ys)
import numpy as np
import matplotlib.pyplot as plt
x = (22.81, 22.81, 22.78, 22.71, 22.55, 22.54, 22.51, 22.37)
y = (5, 16, 23, 34, 61, 68, 77, 86)
# Sort arrays
xs = np.sort(x)
ys = np.array(y)[np.argsort(x)]
# x coordinate
x0 = 22.61
# interpolated y coordinate
y0 = np.interp(x0, xs, ys)
#Plot details
plt.figure(figsize=(10,7)), plt.plot(style='.-')
plt.title("Temperature as a Function of Depth")
plt.xlabel("Temperature"), plt.ylabel("Depth")
plt.gca().invert_yaxis()
plt.plot(x,y, linestyle='--', marker='o', color='b')
plt.plot(x0,y0, marker="o", color="C3")
I think Scipy provides a more intuitive API to solve this problem. You can then easily continue working with your data in Pandas.
from scipy.interpolate import interp1d
x = np.array((22.81, 22.81, 22.78, 22.71, 22.55, 22.54, 22.51, 22.37))
y = np.array((5, 16, 23, 34, 61, 68, 77, 86))
# fit the interpolation on the original index and values
f = interp1d(x, y, kind='linear')
# perform interpolation for values across the full desired index
f([22.81,22.81,22.78,22.71,22.61,22.55,22.54,22.51,22.37])
Output:
array([16. , 16. , 23. , 34. , 50.875, 61. , 68. , 77. ,
86. ])
You can choose multiple other non-linear interpolations too (quadratic, cubic and so on). Check out the comprehensive interpolation documentation for more detail.
[Edit]: You will need to sort your arrays on the x axis as @ImportanceOfBeingErnest adds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With