I have a DF like so: <pre class="prettyprint"><code>Name Food Year_eaten Month_eaten Maria Rice 2014 3 Maria Rice 2015 NaN Maria Rice 2016 NaN Jack Steak 2011 NaN Jack Steak 2012 5 Jack Steak 2013 NaN </code></pre> I want the output to look like this: <pre class="prettyprint"><code>Name Food Year_eaten Month_eaten Maria Rice 2014 3 Maria Rice 2015 3 Maria Rice 2016 3 Jack Steak 2011 5 Jack Steak 2012 5 Jack Steak 2013 5 </code></pre> I want to fill in the NaN's according to this condition: <pre class="prettyprint"><code>If the row's Name, Food is the same and the Year's are consecutive: Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN </code></pre> There will be a person that has all NaN's for the Month_eaten, but I don't need to worry about that for now. Only the one's with at least one value for the Month_eaten in any of the years. Any thoughts would be appreciated!

You can group on "Name", "Food", and a custom column created by <code>diff</code>ing the rows of "Year_eaten". <pre class="prettyprint"><code>u = df.Year_eaten.diff().bfill().ne(1).cumsum() v = df.groupby(['Name','Food', v]).Month_eaten.transform('first') df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer') df Name Food Year_eaten Month_eaten 0 Maria Rice 2014 3 1 Maria Rice 2015 3 2 Maria Rice 2016 3 3 Jack Steak 2011 5 4 Jack Steak 2012 5 5 Jack Steak 2013 5 </code></pre> <hr> Another solution, if no group has all rows with NaN, is using <code>groupby</code> and <code>ffill</code> (everything else is the same). <pre class="prettyprint"><code>df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill() df Name Food Year_eaten Month_eaten 0 Maria Rice 2014 3 1 Maria Rice 2015 3 2 Maria Rice 2016 3 3 Jack Steak 2011 5 4 Jack Steak 2012 5 5 Jack Steak 2013 5 </code></pre>

Using <code>diff().ne(1).cumsum()</code> create the continue year group key <pre class="prettyprint"><code>continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum()) </code></pre> <hr> Then using <code>groupby</code> with <code>apply</code> <code>ffill</code> and <code>bfill</code> <pre class="prettyprint"><code>df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int)) Out[26]: 0 3 1 3 2 3 3 5 4 5 5 5 Name: Month_eaten, dtype: int32 </code></pre>

Filling in NaN values according to another Column and Row in pandas

Tags:

I have a DF like so:

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015              NaN
Maria     Rice        2016              NaN
Jack      Steak       2011              NaN
Jack      Steak       2012               5
Jack      Steak       2013              NaN

I want the output to look like this:

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015               3
Maria     Rice        2016               3
Jack      Steak       2011               5
Jack      Steak       2012               5
Jack      Steak       2013               5

I want to fill in the NaN's according to this condition:

If the row's Name, Food is the same and the Year's are consecutive:
     Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN

There will be a person that has all NaN's for the Month_eaten, but I don't need to worry about that for now. Only the one's with at least one value for the Month_eaten in any of the years.

Any thoughts would be appreciated!

610

asked Dec 20 '18 15:12

The Dodo

2 Answers

You can group on "Name", "Food", and a custom column created by diffing the rows of "Year_eaten".

u = df.Year_eaten.diff().bfill().ne(1).cumsum()
v = df.groupby(['Name','Food', v]).Month_eaten.transform('first')

df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer')

df
    Name   Food  Year_eaten  Month_eaten
0  Maria   Rice        2014            3
1  Maria   Rice        2015            3
2  Maria   Rice        2016            3
3   Jack  Steak        2011            5
4   Jack  Steak        2012            5
5   Jack  Steak        2013            5

Another solution, if no group has all rows with NaN, is using groupby and ffill (everything else is the same).

df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill()
df
    Name   Food  Year_eaten  Month_eaten
0  Maria   Rice        2014            3
1  Maria   Rice        2015            3
2  Maria   Rice        2016            3
3   Jack  Steak        2011            5
4   Jack  Steak        2012            5
5   Jack  Steak        2013            5

187

answered Sep 27 '22 20:09

cs95

Using diff().ne(1).cumsum() create the continue year group key

continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())

Then using groupby with apply ffill and bfill

df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0    3
1    3
2    3
3    5
4    5
5    5
Name: Month_eaten, dtype: int32

answered Sep 27 '22 22:09

BENY

Related questions
                            
                                NestedScrollView + CoodinatorLayout scrollBy() scrollTo() methods do nothing
                            
                                Does Sequelize.js escape input for SQL injection by default?
                            
                                How can I read file content with react-dropzone?
                            
                                Create component as a library in Angular Nrwl Nx
                            
                                JsPDF - Cannot read property 'charAt' of undefined
                            
                                How to connect with ODP.NET Core to Oracle 9i database - Managed Driver
                            
                                How to rotate fancy arrow in Matplotlib
                            
                                Python Flask - how to remember anonymous users via cookie/session?
                            
                                Problem updating three tables that are linked to eachother
                            
                                kotlin wrong nullability inference without any generics
                            
                                Dependency injection/ORM in Vertx for Java
                            
                                How is the Tf-Idf value calculated with analyzer ='char'?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With