I have data that looks like this. Each row represents a value of that ID at some date.
ID Date Value
A 2012-01-05 50
A 2012-01-08 100
A 2012-01-10 200
B 2012-07-01 10
B 2012-07-03 20
I need to expand this so that I have rows for all days. The value of each day should be the value of the day before (i.e., think of the data above as updates of values, and the data below as a timeseries of values).
ID Date Value
A 2012-01-05 50
A 2012-01-06 50
A 2012-01-07 50
A 2012-01-08 100
A 2012-01-09 100
A 2012-01-10 200
B 2012-07-01 10
B 2012-07-02 10
B 2012-07-03 20
Currently, I have a solution that amounts to the following:
It works, but seems like a pretty bad bruteforce solution. I wonder if there's a better approach supported by Pandas?
Using resample on Date indexed dataframe with ID groups and ffill on value
In [1725]: df.set_index('Date').groupby('ID').resample('1D')['Value'].ffill().reset_index()
Out[1725]:
ID Date Value
0 A 2012-01-05 50
1 A 2012-01-06 50
2 A 2012-01-07 50
3 A 2012-01-08 100
4 A 2012-01-09 100
5 A 2012-01-10 200
6 B 2012-07-01 10
7 B 2012-07-02 10
8 B 2012-07-03 20
Or you can try this one (Notice : this can be used for expend numeric column too ).
df.Date=pd.to_datetime(df.Date)
df=df.set_index(df.Date)
df.set_index(df.Date).groupby('ID')\
.apply(lambda x : x.reindex(pd.date_range(min(x.index), max(x.index),freq='D')))\
.ffill().reset_index(drop=True)
Out[519]:
ID Date Value
0 A 2012-01-05 50.0
1 A 2012-01-05 50.0
2 A 2012-01-05 50.0
3 A 2012-01-08 100.0
4 A 2012-01-08 100.0
5 A 2012-01-10 200.0
6 B 2012-07-01 10.0
7 B 2012-07-01 10.0
8 B 2012-07-03 20.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With