Pandas-Add missing years in time series data with duplicate years

Question

I have a dataset like this where data for some years are missing .

County Year Pop
12     1999 1.1
12     2001 1.2
13     1999 1.0
13     2000 1.1

I want something like

County Year Pop
12     1999 1.1
12     2000 NaN
12     2001 1.2
13     1999 1.0
13     2000 1.1
13     2001 nan

I have tried setting index to year and then using reindex with another dataframe of just years method (mentioned here Pandas: Add data for missing months) but it gives me error cant reindex with duplicate values. I have also tried df.loc but it has same issue. I even tried a full outer join with blank df of just years but that also didnt work.

How can I solve this?

John Zwinck · Accepted Answer

Make a MultiIndex so you don't have duplicates:

df.set_index(['County', 'Year'], inplace=True)

Then construct a full MultiIndex with all the combinations:

index = pd.MultiIndex.from_product(df.index.levels)

Then reindex:

df.reindex(index)

The construction of the MultiIndex is untested and may need a little tweaking (e.g. if a year is entirely absent from all counties), but I think you get the idea.

Pandas-Add missing years in time series data with duplicate years

Tags:

python

pandas

missing-data

time-series

reindex

ks2882

1 Answers

John Zwinck

Recent Activity

Donate For Us

Pandas-Add missing years in time series data with duplicate years

Tags:

python

pandas

missing-data

time-series

reindex

ks2882

1 Answers

John Zwinck

Related questions

Recent Activity

Donate For Us