Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling in NaN values according to another Column and Row in pandas

Tags:

I have a DF like so:

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015              NaN
Maria     Rice        2016              NaN
Jack      Steak       2011              NaN
Jack      Steak       2012               5
Jack      Steak       2013              NaN

I want the output to look like this:

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015               3
Maria     Rice        2016               3
Jack      Steak       2011               5
Jack      Steak       2012               5
Jack      Steak       2013               5

I want to fill in the NaN's according to this condition:

If the row's Name, Food is the same and the Year's are consecutive:
     Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN

There will be a person that has all NaN's for the Month_eaten, but I don't need to worry about that for now. Only the one's with at least one value for the Month_eaten in any of the years.

Any thoughts would be appreciated!

like image 610
The Dodo Avatar asked Dec 20 '18 15:12

The Dodo


People also ask

How fill NaN values in pandas with values from another column?

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc.


2 Answers

You can group on "Name", "Food", and a custom column created by diffing the rows of "Year_eaten".

u = df.Year_eaten.diff().bfill().ne(1).cumsum()
v = df.groupby(['Name','Food', v]).Month_eaten.transform('first')

df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer')

df
    Name   Food  Year_eaten  Month_eaten
0  Maria   Rice        2014            3
1  Maria   Rice        2015            3
2  Maria   Rice        2016            3
3   Jack  Steak        2011            5
4   Jack  Steak        2012            5
5   Jack  Steak        2013            5

Another solution, if no group has all rows with NaN, is using groupby and ffill (everything else is the same).

df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill()
df
    Name   Food  Year_eaten  Month_eaten
0  Maria   Rice        2014            3
1  Maria   Rice        2015            3
2  Maria   Rice        2016            3
3   Jack  Steak        2011            5
4   Jack  Steak        2012            5
5   Jack  Steak        2013            5
like image 187
cs95 Avatar answered Sep 27 '22 20:09

cs95


Using diff().ne(1).cumsum() create the continue year group key

continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())

Then using groupby with apply ffill and bfill

df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0    3
1    3
2    3
3    5
4    5
5    5
Name: Month_eaten, dtype: int32
like image 43
BENY Avatar answered Sep 27 '22 22:09

BENY