Add missing rows for groups in pandas

Question

I have a data with a country, year and value column. the max(year) is 1985 and min(year) is 2016 but not all country have data of all year(1985-2016). So to plot comparable bar plot between countries, I want to add value 0 for missing year for each country.

for example:

df -->
    country year    value
0   India   2040    354
1   India   2041    357
2   India   2042    454
3   USA     2040    454
4   USA     2041    436

As USA don't have 2042 data so adding it result into:

    country year    value
0   India   2040    354
1   India   2041    357
2   India   2042    454
3   USA     2040    454
4   USA     2041    436
5   USA     2042    0

How to do it for each country in my data?

cs95 · Accepted Answer

We can convert "year" to a Categorical column and then allow pandas GroupBy to do the heavy lifting:

df['year'] = pd.Categorical(df['year'], categories=df['year'].unique())
df.groupby(['country','year'], as_index=False).first()

  country  year  value
0   India  2040  354.0
1   India  2041  357.0
2   India  2042  454.0
3     USA  2040  454.0
4     USA  2041  436.0
5     USA  2042    NaN

Another idea is reindexing:

mux = pd.MultiIndex.from_product([df['country'].unique(), df['year'].unique()])

(df.set_index(['country', 'year'])
   .reindex(mux)
   .reset_index()
   .set_axis(df.columns, axis=1))

  country  year  value
0   India  2040  354.0
1   India  2041  357.0
2   India  2042  454.0
3     USA  2040  454.0
4     USA  2041  436.0
5     USA  2042    NaN

Important caveat: Neither of these solutions will handle duplicate rows well. You will need to dedupe the rows by adding a uniquely identifying column, possibly using GroupBy.cumcount.

BENY · Answer

Let us try pivot then stack

out = df.pivot(*df).stack(dropna=False).reset_index(name='value')
  country  year  value
0   India  2040  354.0
1   India  2041  357.0
2   India  2042  454.0
3     USA  2040  454.0
4     USA  2041  436.0
5     USA  2042    NaN

Add missing rows for groups in pandas

Tags:

python

pandas

dataframe

harsh poddar

2 Answers

cs95

BENY

Recent Activity

Donate For Us

Add missing rows for groups in pandas

Tags:

python

pandas

dataframe

harsh poddar

2 Answers

cs95

BENY

Related questions

Recent Activity

Donate For Us