I have a dataset like this where data for some years are missing .
County Year Pop
12 1999 1.1
12 2001 1.2
13 1999 1.0
13 2000 1.1
I want something like
County Year Pop
12 1999 1.1
12 2000 NaN
12 2001 1.2
13 1999 1.0
13 2000 1.1
13 2001 nan
I have tried setting index to year and then using reindex with another dataframe of just years method (mentioned here Pandas: Add data for missing months) but it gives me error cant reindex with duplicate values. I have also tried df.loc but it has same issue. I even tried a full outer join with blank df of just years but that also didnt work.
How can I solve this?
Make a MultiIndex so you don't have duplicates:
df.set_index(['County', 'Year'], inplace=True)
Then construct a full MultiIndex with all the combinations:
index = pd.MultiIndex.from_product(df.index.levels)
Then reindex:
df.reindex(index)
The construction of the MultiIndex is untested and may need a little tweaking (e.g. if a year is entirely absent from all counties), but I think you get the idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With