Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-index dataframe by new range of dates

I have a data frame containing a number of observations:

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-20   yellow     3

I would like to re-index the data frame and standardise the dates.

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-22   red        NaN
2014-10-20   yellow     3
2014-10-21   yellow     NaN
2014-10-22   yellow     NaN

I though to order the data frame by colour and date, and then try to re-index it.

index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts

But it returns a new data frame with the right index but all NaN values.

date         colour     orders
2014-10-20   NaN        NaN
2014-10-21   NaN        NaN
2014-10-22   NaN        NaN
like image 419
Gianluca Avatar asked Dec 11 '14 11:12

Gianluca


1 Answers

Starting from your exampe dataframe:

In [51]: df
Out[51]:
        date  colour  orders
0 2014-10-20     red       7
1 2014-10-21     red      10
2 2014-10-20  yellow       3

If you want to reindex on both 'date' and 'colour', one possibility is to set both as the index (a multi-index):

In [52]: df = df.set_index(['date', 'colour'])

In [53]: df
Out[53]:
                   orders
date       colour
2014-10-20 red          7
2014-10-21 red         10
2014-10-20 yellow       3

You can now reindex this dataframe, after you constructed to desired index:

In [54]: index = pd.date_range('20/10/2014', '22/10/2014')

In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])

In [56]: df.reindex(multi_index)
Out[56]:
                   orders
2014-10-20 red          7
           yellow       3
2014-10-21 red         10
           yellow     NaN
2014-10-22 red        NaN
           yellow     NaN

To have the same output as your example output, the index should be sorted in the second level (level=1 as it is 0-based):

In [60]: df2 = df.reindex(multi_index)

In [64]: df2.sortlevel(level=1)
Out[64]:
                   orders
2014-10-20 red          7
2014-10-21 red         10
2014-10-22 red        NaN
2014-10-20 yellow       3
2014-10-21 yellow     NaN
2014-10-22 yellow     NaN

A possible way to generate the multi-index automatically would be (with your original frame):

pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'), 
                            df['colour'].unique()])

Another way would be to use resample for each group of colors:

In [77]: df = df.set_index('date')

In [78]: df.groupby('colour').resample('D')

This is simpler, but this does not give you the full range of dates for each colour, only the range of dates that is available for that colour group.

like image 144
joris Avatar answered Oct 30 '22 17:10

joris