Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fillna by groupby outputs in pandas?

Tags:

python

pandas

I have a dataframe having 4 columns(A,B,C,D). D has some NaN entries. I want to fill the NaN values by the average value of D having same value of A,B,C.

For example,if the value of A,B,C,D are x,y,z and Nan respectively,then I want the NaN value to be replaced by the average of D for the rows where the value of A,B,C are x,y,z respectively.

like image 905
Abhisek Dash Avatar asked Jan 16 '17 15:01

Abhisek Dash


People also ask

How do you get Fillna on pandas?

Pandas DataFrame fillna() MethodThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

How do you sort by Groupby in Python?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

How do you get Groupby rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

What does Groupby return pandas?

As I said above groupby() function returns DataFrameGroupBy object after grouping the data on pandas DataFrame. This object contains several methods ( sum() , mean() e.t.c) that can be used to aggregate the grouped rows.


1 Answers

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean')) would be faster than apply

In [2400]: df
Out[2400]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
Out[2401]:
0    1.0
1    2.0
2    3.0
3    5.0
Name: D, dtype: float64

In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))

In [2403]: df
Out[2403]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

Details

In [2396]: df.shape
Out[2396]: (10000, 4)

In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
100 loops, best of 3: 3.44 ms per loop


In [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
100 loops, best of 3: 5.34 ms per loop
like image 138
Zero Avatar answered Oct 04 '22 15:10

Zero