Imagine I have a dataframe that looks like:
ID DATE VALUE
1 31-01-2006 5
1 28-02-2006 5
1 31-05-2006 10
1 30-06-2006 11
2 31-01-2006 5
2 31-02-2006 5
2 31-03-2006 5
2 31-04-2006 5
As you can see this is panel data with multiple entries on the same date for different IDs. What I want to do is fill in missing dates for each ID. You can see that for ID "1" there is a jump in months between the second and third entry.
I would like a dataframe that looks like:
ID DATE VALUE
1 31-01-2006 5
1 28-02-2006 5
1 31-03-2006 NA
1 30-04-2006 NA
1 31-05-2006 10
1 30-06-2006 11
2 31-01-2006 5
2 31-02-2006 5
2 31-03-2006 5
2 31-04-2006 5
I have no idea how to do this since I can not index by date since there are duplicate dates.
One way is to use pivot_table and then unstack:
In [11]: df.pivot_table("VALUE", "DATE", "ID")
Out[11]:
ID 1 2
DATE
28-02-2006 5.0 NaN
30-06-2006 11.0 NaN
31-01-2006 5.0 5.0
31-02-2006 NaN 5.0
31-03-2006 NaN 5.0
31-04-2006 NaN 5.0
31-05-2006 10.0 NaN
In [12]: df.pivot_table("VALUE", "DATE", "ID").unstack().reset_index()
Out[12]:
ID DATE 0
0 1 28-02-2006 5.0
1 1 30-06-2006 11.0
2 1 31-01-2006 5.0
3 1 31-02-2006 NaN
4 1 31-03-2006 NaN
5 1 31-04-2006 NaN
6 1 31-05-2006 10.0
7 2 28-02-2006 NaN
8 2 30-06-2006 NaN
9 2 31-01-2006 5.0
10 2 31-02-2006 5.0
11 2 31-03-2006 5.0
12 2 31-04-2006 5.0
13 2 31-05-2006 NaN
An alternative, perhaps slightly more efficient way is to reindex from_product:
In [21] df1 = df.set_index(['ID', 'DATE'])
In [22]: df1.reindex(pd.MultiIndex.from_product(df1.index.levels))
Out[22]:
VALUE
1 28-02-2006 5.0
30-06-2006 11.0
31-01-2006 5.0
31-02-2006 NaN
31-03-2006 NaN
31-04-2006 NaN
31-05-2006 10.0
2 28-02-2006 NaN
30-06-2006 NaN
31-01-2006 5.0
31-02-2006 5.0
31-03-2006 5.0
31-04-2006 5.0
31-05-2006 NaN
Another solution is to convert the incomplete data to a "wide" form (a table; this will create cells for the missing values) and then back to a "tall" form.
df.set_index(['ID','DATE']).unstack().stack(dropna=False).reset_index()
# ID DATE VALUE
#0 1 28-02-2006 5.0
#1 1 30-06-2006 11.0
#2 1 31-01-2006 5.0
#3 1 31-02-2006 NaN
#4 1 31-03-2006 NaN
#5 1 31-04-2006 NaN
#6 1 31-05-2006 10.0
#7 2 28-02-2006 NaN
#....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With