Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forward-fill dates within MultiIndexed DataFrame

Tags:

python

pandas

I have a PANDAs DataFrame with a MultiIndex, where one of the levels represents a year:

import pandas as pd
df = pd.DataFrame(dict(A = ['foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
                       B = ['white', 'black', 'white', 'white', 'black', 'black'],
                       year = [1990, 1992, 1990, 1992, 1991, 1992],
                       value = [3.14, 1.20, 4.56, 6.79, 0.01, 0.02]))

df = df.set_index(['A', 'B', 'year'])

I would like to forward-fill values, but only for the intervening years in each group (defined by the interaction of A and B). Here is the input:

                value
A   B     year       
foo white 1990   3.14
    black 1992   1.20
bar white 1990   4.56
          1992   6.79
    black 1991   0.01
          1992   0.02

And here is the desired output, with one additional row:

                value
A   B     year       
foo white 1990   3.14
    black 1992   1.20
bar white 1990   4.56
          1991   4.56  <-- new forward-filled value
          1992   6.79
    black 1991   0.01
          1992   0.02

How can I accomplish this concisely and efficiently? I've tried using combinations of groupby and apply, but I'm new to PANDAS and keep throwing Exceptions.

Here's an example of how I'm naively approaching the problem:

def ffill_years(df):
    df.reset_index(['A', 'B'])  # drop all but 'year'
    year_range = range(df['year'].min(), df['year'].max())
    df.reindex(pd.Series(years)).fillna("ffill")
    return df

df.groupby(level=['A', 'B']).apply(ffill_years)

Of course this doesn't work. Any and all tips appreciated!

like image 227
dholstius Avatar asked May 30 '26 01:05

dholstius


1 Answers

You were pretty close - a couple small changes:

  1. reset_index doesn't operate in place
  2. Can't reference index by name, need to use .index
  3. Need a +1 on your range to include the endpoint
  4. reindex is also not in-place
  5. First parameter to fillna is a fill value, use keyword method

See below:

def ffill_years(df):
    df = df.reset_index(['A','B'])  # drop all but 'year'
    year_range = range(df.index.min(), df.index.max() + 1)

    df = df.reindex(pd.Series(year_range)).fillna(method='ffill')
    return df

Results in

In [209]: df.groupby(level=['A','B']).apply(ffill_years)
Out[209]: 
                  A      B  value
A   B     year                   
bar black 1991  bar  black   0.01
          1992  bar  black   0.02
    white 1990  bar  white   4.56
          1991  bar  white   4.56
          1992  bar  white   6.79
foo black 1992  foo  black   1.20
    white 1990  foo  white   3.14
like image 105
chrisb Avatar answered May 31 '26 14:05

chrisb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!