Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas: How fill date ranges in a multiindex

Tags:

python

pandas

Suppose I was trying to organize sales data for a membership business.

I only have the start and end dates. Ideally sales between the start and end dates appear as 1, instead of missing.

I can't get the 'date' column to be filled with in-between dates. That is: I want a continuous set of months instead of gaps. Plus I need to fill missing data in columns with ffill.

I have tried different ways such as stack/unstack and reindex but different errors occur. I'm guessing there's a clean way to do this. What's the best practice to do this?

Suppose the multiindexed data structure:

                 variable     sales
vendor date                 
a      2014-01-01  start date 1
       2014-03-01    end date 1
b      2014-03-01  start date 1
       2014-07-01    end date 1

And the desired result

                   variable   sales
vendor date                 
a      2014-01-01  start date 1
       2014-02-01  NaN        1
       2014-03-01    end date 1
b      2014-03-01  start date 1
       2014-04-01  NaN        1
       2014-05-01  NaN        1
       2014-06-01  NaN        1 
       2014-07-01    end date 1
like image 300
LPG Avatar asked Dec 02 '14 18:12

LPG


1 Answers

you can do:

>>> f = lambda df: df.resample(rule='M', how='first')
>>> df.reset_index(level=0).groupby('vendor').apply(f).drop('vendor', axis=1)
                     variable  sales
vendor date                         
a      2014-01-31  start date      1
       2014-02-28         NaN    NaN
       2014-03-31    end date      1
b      2014-03-31  start date      1
       2014-04-30         NaN    NaN
       2014-05-31         NaN    NaN
       2014-06-30         NaN    NaN
       2014-07-31    end date      1

and then just .fillna on sales column if needed.

like image 142
behzad.nouri Avatar answered Sep 21 '22 17:09

behzad.nouri