Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas interpolate NaNs based on different column

I have the following DataFrame (extract)

data = pd.DataFrame([[0., -10.88948939, 74.22099994, 1.5, "NW", 0], [0.819377018, -10.88948939, 74.22099994, 1.5, "NW", 1], [8.47965933, -10.88948939, 74.22099994, 1.5, "NW", 10], [15.38036833, -10.88948939, 74.22099994, 1.5, "NW", 20]], columns=["Velocity", "X", "Y", "Z", "wind_direction", "wind_speed"])

Velocity  X      Y     Z  wind_direction wind_speed
0        -10.88 74.22 1.5 NW             0
0.82     -10.89 74.22 1.5 NW             1
8.48     -10.89 74.22 1.5 NW             10
15.38    -10.89 74.22 1.5 NW             20

It represents the results of a CFD simulation for a specific coordinate (X, Y, Z) and two boundary conditions (wind_direction and wind_speed).

I would like to estimate Velocity for the same point (X, Y, Z), same wind_direction, but intermediate wind_speed, say 4.6. I have this additional row in my dataframe

NaN -10.89 74.22 1.5 NW 4.6

Now I would like to interpolate to fill the NaN based on the wind_speed. For the example above I would expect to get 6.643773541

The number comes from the linear interpolation:

0.82 + (4.6 - 1)/(10 - 1) * (8.48 - 0.82)

Any idea? Thanks

UPDATE

I have found a solution to the issue above. The trick is to use groupby and define a function that interpolates over the dataframe that is created by groupby and passed to apply(). In my case, this is the function

def interp(x, wind_speed):
    g = interpolate.interp1d(np.array(x["wind_speed"]), np.array(x["Velocity"]))
    return g(wind_speed)

and this is my groupby

group = df.groupby("point").apply(interp, wind_speed)

The function interp has to be called with a parameter that represents the point where to perform the interpolation.

I wonder whether there is a better way to do it.

like image 725
Rojj Avatar asked Dec 01 '14 18:12

Rojj


People also ask

How do you interpolate a column in pandas?

interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs) Parameters : method : {'linear', 'time', 'index', 'values', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'polynomial', 'spline', 'piecewise_polynomial', ' ...

How do pandas interpolate missing values?

You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.

How does Panda interpolate work?

interpolate() function is basically used to fill NA values in the dataframe or series. But, this is a very powerful function to fill the missing values. It uses various interpolation technique to fill the missing values rather than hard-coding the value. axis : 0 fill column-by-column and 1 fill row-by-row.


1 Answers

I have found a solution to the issue above. The trick is to use groupby and define a function that interpolates over the dataframe that is created by groupby and passed to apply(). In my case, this is the function

def interp(x, wind_speed):
    g = interpolate.interp1d(np.array(x["wind_speed"]), np.array(x["Velocity"]))
    return g(wind_speed)

and this is my groupby

group = df.groupby("point").apply(interp, wind_speed)

The function interp has to be called with a parameter that represents the point where to perform the interpolation.

I wonder whether there is a better way to do it.

like image 152
Rojj Avatar answered Dec 13 '22 05:12

Rojj