Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - Extend Index of a DataFrame setting all columns for new rows to NaN?

Tags:

python

pandas

I have time-indexed data:

df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) }) df2 = df2.set_index('day') df2                b  day              2012-01-01  0.22 2012-01-03  0.30 

What is the best way to extend this data frame so that it has one row for every day in January 2012 (say), where all columns are set to NaN (here only b) where we don't have data?

So the desired result would be:

               b  day              2012-01-01  0.22 2012-01-02   NaN 2012-01-03  0.30 2012-01-04   NaN ... 2012-01-31   NaN 

Many thanks!

like image 755
paul Avatar asked Oct 01 '13 14:10

paul


People also ask

How do I reindex rows in pandas DataFrame?

One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.

How do I change the index of a DataFrame in pandas?

To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.

How will you explain Reindexing in pandas?

Reindexing changes the row labels and column labels of a DataFrame. To reindex means to conform the data to match a given set of labels along a particular axis. Reorder the existing data to match a new set of labels. Insert missing value (NA) markers in label locations where no data for the label existed.


1 Answers

Use this (current as of pandas 1.1.3):

ix = pd.date_range(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D') df2.reindex(ix) 

Which gives:

               b 2012-01-01  0.22 2012-01-02   NaN 2012-01-03  0.30 2012-01-04   NaN 2012-01-05   NaN [...] 2012-01-29   NaN 2012-01-30   NaN 2012-01-31   NaN 

For older versions of pandas replace pd.date_range with pd.DatetimeIndex.

like image 85
Mark Avatar answered Sep 23 '22 03:09

Mark