I have a pandas DataFrame with a two-level multiindex. The second level is numeric and supposed to be sorted and sequential for each unique value of the first-level index, but has gaps. How do I insert the "missing" rows? Sample input:
import pandas as pd
df = pd.DataFrame(list(range(5)),
index=pd.MultiIndex.from_tuples([('A',1), ('A',3),
('B',2), ('B',3), ('B',6)]),
columns='value')
# value
#A 1 0
# 3 1
#B 2 2
# 3 3
# 6 4
Expected output:
# value
#A 1 0
# 2 NaN
# 3 1
#B 2 2
# 3 3
# 4 NaN
# 5 NaN
# 6 4
I suspect I could have used resample
, but I am having trouble converting the numbers to anything date-like.
If there is a will, there is a way. I am not proud of this but, I think it works.
Try:
def f(x):
levels = x.index.remove_unused_levels().levels
x = x.reindex(pd.MultiIndex.from_product([levels[0], np.arange(levels[1][0], levels[1][-1]+1)]))
return x
df.groupby(level=0, as_index=False, group_keys=False).apply(f)
Output:
value
A 1 0.0
2 NaN
3 1.0
B 2 2.0
3 3.0
4 NaN
5 NaN
6 4.0
After much deliberations, I was able to come up with a solution myself. Judging by the fact of how lousy it is, the problem I am facing is not a very typical one.
new_index = d.index.to_frame()\
.groupby(0)[1]\
.apply(lambda x:
pd.Series(1, index=range(x.min(), x.max() + 1))).index
d.reindex(new_index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With