I have a pandas DataFrame with a two-level multiindex. The second level is numeric and supposed to be sorted and sequential for each unique value of the first-level index, but has gaps. How do I insert the "missing" rows? Sample input:
import pandas as pd
df = pd.DataFrame(list(range(5)),
                  index=pd.MultiIndex.from_tuples([('A',1), ('A',3),
                                                   ('B',2), ('B',3), ('B',6)]),
                  columns='value')
#     value
#A 1      0
#  3      1
#B 2      2
#  3      3
#  6      4
Expected output:
#     value
#A 1      0
#  2    NaN
#  3      1
#B 2      2
#  3      3
#  4    NaN
#  5    NaN
#  6      4
I suspect I could have used resample, but I am having trouble converting the numbers to anything date-like.
If there is a will, there is a way. I am not proud of this but, I think it works.
Try:
def f(x):
    levels = x.index.remove_unused_levels().levels
    x = x.reindex(pd.MultiIndex.from_product([levels[0], np.arange(levels[1][0], levels[1][-1]+1)]))
    return x
df.groupby(level=0, as_index=False, group_keys=False).apply(f)
Output:
     value
A 1    0.0
  2    NaN
  3    1.0
B 2    2.0
  3    3.0
  4    NaN
  5    NaN
  6    4.0
                        After much deliberations, I was able to come up with a solution myself. Judging by the fact of how lousy it is, the problem I am facing is not a very typical one.
new_index = d.index.to_frame()\
                .groupby(0)[1]\
                .apply(lambda x:
                         pd.Series(1, index=range(x.min(), x.max() + 1))).index
d.reindex(new_index)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With