Suppose I was trying to organize sales data for a membership business.
I only have the start and end dates. Ideally sales between the start and end dates appear as 1, instead of missing.
I can't get the 'date' column to be filled with in-between dates. That is: I want a continuous set of months instead of gaps. Plus I need to fill missing data in columns with ffill.
I have tried different ways such as stack/unstack and reindex but different errors occur. I'm guessing there's a clean way to do this. What's the best practice to do this?
Suppose the multiindexed data structure:
variable sales
vendor date
a 2014-01-01 start date 1
2014-03-01 end date 1
b 2014-03-01 start date 1
2014-07-01 end date 1
And the desired result
variable sales
vendor date
a 2014-01-01 start date 1
2014-02-01 NaN 1
2014-03-01 end date 1
b 2014-03-01 start date 1
2014-04-01 NaN 1
2014-05-01 NaN 1
2014-06-01 NaN 1
2014-07-01 end date 1
you can do:
>>> f = lambda df: df.resample(rule='M', how='first')
>>> df.reset_index(level=0).groupby('vendor').apply(f).drop('vendor', axis=1)
variable sales
vendor date
a 2014-01-31 start date 1
2014-02-28 NaN NaN
2014-03-31 end date 1
b 2014-03-31 start date 1
2014-04-30 NaN NaN
2014-05-31 NaN NaN
2014-06-30 NaN NaN
2014-07-31 end date 1
and then just .fillna
on sales
column if needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With