I would like to add missing rows corresponding to a given index.
For example, if I have:
df = pd.DataFrame({"date": ["1", "2", "1", "3"],
"name": ["bob", "bob", "anne", "anne"],
"x": [1, 2, 2, 3],
"y": [2, 4, 5, 5]})
I would like to obtain the following:
name date x y
anne 1 2 5
anne 2 NA NA <- because date 2 is missing for Anne
anne 3 3 5
bob 1 1 2
bob 2 2 4
bob 3 NA NA <- because date 3 is missing for Bob
I have tried numerous things with pivot_table, pivot, but could not figure it out so far.
df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).reset_index()
is not filling missing rows.
Use DataFrame.set_index
with DataFrame.unstack
, DataFrame.stack
and DataFrame.reset_index
:
df = df.set_index(["name", "date"]).unstack().stack(dropna=False).reset_index()
print (df)
name date x y
0 anne 1 2.0 5.0
1 anne 2 NaN NaN
2 anne 3 3.0 5.0
3 bob 1 1.0 2.0
4 bob 2 2.0 4.0
5 bob 3 NaN NaN
Your solution is possible same way:
df = df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).unstack().stack(dropna=False).reset_index()
print (df)
name date x y
0 anne 1 2.0 5.0
1 anne 2 NaN NaN
2 anne 3 3.0 5.0
3 bob 1 1.0 2.0
4 bob 2 2.0 4.0
5 bob 3 NaN NaN
Another way is Multiindex.from_product
with reindex
:
idx = pd.MultiIndex.from_product((df['date'].unique(),
df['name'].unique()),names=['date','name'])
df.set_index(['date','name']).reindex(idx).sort_index(level=1).reset_index()
date name x y
0 1 anne 2.0 5.0
1 2 anne NaN NaN
2 3 anne 3.0 5.0
3 1 bob 1.0 2.0
4 2 bob 2.0 4.0
5 3 bob NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With