I have the following DataFrame (extract)
data = pd.DataFrame([[0., -10.88948939, 74.22099994, 1.5, "NW", 0], [0.819377018, -10.88948939, 74.22099994, 1.5, "NW", 1], [8.47965933, -10.88948939, 74.22099994, 1.5, "NW", 10], [15.38036833, -10.88948939, 74.22099994, 1.5, "NW", 20]], columns=["Velocity", "X", "Y", "Z", "wind_direction", "wind_speed"])
Velocity X Y Z wind_direction wind_speed
0 -10.88 74.22 1.5 NW 0
0.82 -10.89 74.22 1.5 NW 1
8.48 -10.89 74.22 1.5 NW 10
15.38 -10.89 74.22 1.5 NW 20
It represents the results of a CFD simulation for a specific coordinate (X, Y, Z) and two boundary conditions (wind_direction and wind_speed).
I would like to estimate Velocity for the same point (X, Y, Z), same wind_direction, but intermediate wind_speed, say 4.6. I have this additional row in my dataframe
NaN -10.89 74.22 1.5 NW 4.6
Now I would like to interpolate to fill the NaN based on the wind_speed. For the example above I would expect to get 6.643773541
The number comes from the linear interpolation:
0.82 + (4.6 - 1)/(10 - 1) * (8.48 - 0.82)
Any idea? Thanks
UPDATE
I have found a solution to the issue above. The trick is to use groupby and define a function that interpolates over the dataframe that is created by groupby and passed to apply(). In my case, this is the function
def interp(x, wind_speed):
g = interpolate.interp1d(np.array(x["wind_speed"]), np.array(x["Velocity"]))
return g(wind_speed)
and this is my groupby
group = df.groupby("point").apply(interp, wind_speed)
The function interp has to be called with a parameter that represents the point where to perform the interpolation.
I wonder whether there is a better way to do it.
interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs) Parameters : method : {'linear', 'time', 'index', 'values', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'polynomial', 'spline', 'piecewise_polynomial', ' ...
You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.
interpolate() function is basically used to fill NA values in the dataframe or series. But, this is a very powerful function to fill the missing values. It uses various interpolation technique to fill the missing values rather than hard-coding the value. axis : 0 fill column-by-column and 1 fill row-by-row.
I have found a solution to the issue above. The trick is to use groupby and define a function that interpolates over the dataframe that is created by groupby and passed to apply(). In my case, this is the function
def interp(x, wind_speed):
g = interpolate.interp1d(np.array(x["wind_speed"]), np.array(x["Velocity"]))
return g(wind_speed)
and this is my groupby
group = df.groupby("point").apply(interp, wind_speed)
The function interp has to be called with a parameter that represents the point where to perform the interpolation.
I wonder whether there is a better way to do it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With