I have a data frame containing a number of observations:
date colour orders
2014-10-20 red 7
2014-10-21 red 10
2014-10-20 yellow 3
I would like to re-index the data frame and standardise the dates.
date colour orders
2014-10-20 red 7
2014-10-21 red 10
2014-10-22 red NaN
2014-10-20 yellow 3
2014-10-21 yellow NaN
2014-10-22 yellow NaN
I though to order the data frame by colour
and date
, and then try to re-index it.
index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts
But it returns a new data frame with the right index but all NaN
values.
date colour orders
2014-10-20 NaN NaN
2014-10-21 NaN NaN
2014-10-22 NaN NaN
Starting from your exampe dataframe:
In [51]: df
Out[51]:
date colour orders
0 2014-10-20 red 7
1 2014-10-21 red 10
2 2014-10-20 yellow 3
If you want to reindex on both 'date' and 'colour', one possibility is to set both as the index (a multi-index):
In [52]: df = df.set_index(['date', 'colour'])
In [53]: df
Out[53]:
orders
date colour
2014-10-20 red 7
2014-10-21 red 10
2014-10-20 yellow 3
You can now reindex this dataframe, after you constructed to desired index:
In [54]: index = pd.date_range('20/10/2014', '22/10/2014')
In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])
In [56]: df.reindex(multi_index)
Out[56]:
orders
2014-10-20 red 7
yellow 3
2014-10-21 red 10
yellow NaN
2014-10-22 red NaN
yellow NaN
To have the same output as your example output, the index should be sorted in the second level (level=1
as it is 0-based):
In [60]: df2 = df.reindex(multi_index)
In [64]: df2.sortlevel(level=1)
Out[64]:
orders
2014-10-20 red 7
2014-10-21 red 10
2014-10-22 red NaN
2014-10-20 yellow 3
2014-10-21 yellow NaN
2014-10-22 yellow NaN
A possible way to generate the multi-index automatically would be (with your original frame):
pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'),
df['colour'].unique()])
Another way would be to use resample
for each group of colors:
In [77]: df = df.set_index('date')
In [78]: df.groupby('colour').resample('D')
This is simpler, but this does not give you the full range of dates for each colour, only the range of dates that is available for that colour group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With