I have time-indexed data:
df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) }) df2 = df2.set_index('day') df2 b day 2012-01-01 0.22 2012-01-03 0.30
What is the best way to extend this data frame so that it has one row for every day in January 2012 (say), where all columns are set to NaN
(here only b
) where we don't have data?
So the desired result would be:
b day 2012-01-01 0.22 2012-01-02 NaN 2012-01-03 0.30 2012-01-04 NaN ... 2012-01-31 NaN
Many thanks!
One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.
To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.
Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis. Reorder the existing data to match a new set of labels. Insert missing value (NA) markers in label locations where no data for the label existed.
Use this (current as of pandas 1.1.3):
ix = pd.date_range(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D') df2.reindex(ix)
Which gives:
b 2012-01-01 0.22 2012-01-02 NaN 2012-01-03 0.30 2012-01-04 NaN 2012-01-05 NaN [...] 2012-01-29 NaN 2012-01-30 NaN 2012-01-31 NaN
For older versions of pandas replace pd.date_range
with pd.DatetimeIndex
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With