Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expand Pandas date range

Tags:

python

pandas

I have data that looks like this. Each row represents a value of that ID at some date.

ID   Date         Value
A    2012-01-05   50
A    2012-01-08   100
A    2012-01-10   200
B    2012-07-01   10
B    2012-07-03   20

I need to expand this so that I have rows for all days. The value of each day should be the value of the day before (i.e., think of the data above as updates of values, and the data below as a timeseries of values).

ID   Date         Value
A    2012-01-05   50
A    2012-01-06   50
A    2012-01-07   50
A    2012-01-08   100
A    2012-01-09   100
A    2012-01-10   200
B    2012-07-01   10
B    2012-07-02   10
B    2012-07-03   20

Currently, I have a solution that amounts to the following:

  • Group by ID
  • For each group, figure out the min and max date
  • Create a pd.date_range
  • Iterate simultaneously through the rows and through the date range, filling the values in the date range and incrementing the index-pointer to the rows if necessary
  • Append all these date ranges to a final dataframe

It works, but seems like a pretty bad bruteforce solution. I wonder if there's a better approach supported by Pandas?

like image 675
gberger Avatar asked Dec 08 '25 18:12

gberger


2 Answers

Using resample on Date indexed dataframe with ID groups and ffill on value

In [1725]: df.set_index('Date').groupby('ID').resample('1D')['Value'].ffill().reset_index()
Out[1725]:
  ID       Date  Value
0  A 2012-01-05     50
1  A 2012-01-06     50
2  A 2012-01-07     50
3  A 2012-01-08    100
4  A 2012-01-09    100
5  A 2012-01-10    200
6  B 2012-07-01     10
7  B 2012-07-02     10
8  B 2012-07-03     20
like image 177
Zero Avatar answered Dec 10 '25 21:12

Zero


Or you can try this one (Notice : this can be used for expend numeric column too ).

df.Date=pd.to_datetime(df.Date)
df=df.set_index(df.Date)
df.set_index(df.Date).groupby('ID')\
   .apply(lambda x : x.reindex(pd.date_range(min(x.index), max(x.index),freq='D')))\
     .ffill().reset_index(drop=True)

Out[519]: 
  ID       Date  Value
0  A 2012-01-05   50.0
1  A 2012-01-05   50.0
2  A 2012-01-05   50.0
3  A 2012-01-08  100.0
4  A 2012-01-08  100.0
5  A 2012-01-10  200.0
6  B 2012-07-01   10.0
7  B 2012-07-01   10.0
8  B 2012-07-03   20.0
like image 22
BENY Avatar answered Dec 10 '25 21:12

BENY