Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas fillna using groupby

Tags:

I am trying to impute/fill values using rows with similar columns' values.

For example, I have this dataframe:

one | two | three 1      1     10 1      1     nan 1      1     nan 1      2     nan 1      2     20 1      2     nan 1      3     nan 1      3     nan 

I wanted to using the keys of column one and two which is similar and if column three is not entirely nan then impute the existing value from a row of similar keys with value in column '3'.

Here is my desired result:

one | two | three 1      1     10 1      1     10 1      1     10 1      2     20 1      2     20 1      2     20 1      3     nan 1      3     nan 

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

I have tried using groupby+fillna():

df['three'] = df.groupby(['one','two'])['three'].fillna() 

which gave me an error.

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill() 
like image 763
Phurich.P Avatar asked Sep 24 '17 14:09

Phurich.P


People also ask

How do I use Fillna in pandas?

Definition and UsageThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

Can you group by multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

How do we fill missing values What are techniques used for same what are functions given by pandas to fill the missing values?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


2 Answers

If only one non NaN value per group use ffill (forward filling) and bfill (backward filling) per group, so need apply with lambda:

df['three'] = df.groupby(['one','two'], sort=False)['three']                 .apply(lambda x: x.ffill().bfill()) print (df)    one  two  three 0    1    1   10.0 1    1    1   10.0 2    1    1   10.0 3    1    2   20.0 4    1    2   20.0 5    1    2   20.0 6    1    3    NaN 7    1    3    NaN 

But if multiple value per group and need replace NaN by some constant - e.g. mean by group:

print (df)    one  two  three 0    1    1   10.0 1    1    1   40.0 2    1    1    NaN 3    1    2    NaN 4    1    2   20.0 5    1    2    NaN 6    1    3    NaN 7    1    3    NaN  df['three'] = df.groupby(['one','two'], sort=False)['three']                 .apply(lambda x: x.fillna(x.mean())) print (df)    one  two  three 0    1    1   10.0 1    1    1   40.0 2    1    1   25.0 3    1    2   20.0 4    1    2   20.0 5    1    2   20.0 6    1    3    NaN 7    1    3    NaN 
like image 156
jezrael Avatar answered Sep 20 '22 02:09

jezrael


You can sort data by the column with missing values then groupby and forwardfill:

df.sort_values('three', inplace=True) df['three'] = df.groupby(['one','two'])['three'].ffill() 
like image 33
Mykola Zotko Avatar answered Sep 19 '22 02:09

Mykola Zotko