Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape Pandas Dataframe with duplicate Index and fill missing rows

Tags:

python

pandas

I would like to add missing rows corresponding to a given index.

For example, if I have:

df = pd.DataFrame({"date": ["1", "2", "1", "3"],
                   "name": ["bob", "bob", "anne", "anne"],
                   "x": [1, 2, 2, 3],
                   "y": [2, 4, 5, 5]})

I would like to obtain the following:

    name    date    x   y
    anne    1       2   5
    anne    2       NA  NA   <- because date 2 is missing for Anne
    anne    3       3   5
    bob     1       1   2
    bob     2       2   4
    bob     3       NA  NA   <- because date 3 is missing for Bob

I have tried numerous things with pivot_table, pivot, but could not figure it out so far.

df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).reset_index()

is not filling missing rows.

like image 686
Timothée HENRY Avatar asked Dec 31 '22 07:12

Timothée HENRY


2 Answers

Use DataFrame.set_index with DataFrame.unstack, DataFrame.stack and DataFrame.reset_index:

df = df.set_index(["name", "date"]).unstack().stack(dropna=False).reset_index()
print (df)
   name date    x    y
0  anne    1  2.0  5.0
1  anne    2  NaN  NaN
2  anne    3  3.0  5.0
3   bob    1  1.0  2.0
4   bob    2  2.0  4.0
5   bob    3  NaN  NaN

Your solution is possible same way:

df = df.pivot_table(index = ["name", "date"], values = ['x','y'], fill_value=0).unstack().stack(dropna=False).reset_index()
print (df)
   name date    x    y
0  anne    1  2.0  5.0
1  anne    2  NaN  NaN
2  anne    3  3.0  5.0
3   bob    1  1.0  2.0
4   bob    2  2.0  4.0
5   bob    3  NaN  NaN
like image 78
jezrael Avatar answered Jan 02 '23 19:01

jezrael


Another way is Multiindex.from_product with reindex:

idx = pd.MultiIndex.from_product((df['date'].unique(),
                                 df['name'].unique()),names=['date','name'])
df.set_index(['date','name']).reindex(idx).sort_index(level=1).reset_index()

  date  name    x    y
0    1  anne  2.0  5.0
1    2  anne  NaN  NaN
2    3  anne  3.0  5.0
3    1   bob  1.0  2.0
4    2   bob  2.0  4.0
5    3   bob  NaN  NaN
like image 36
anky Avatar answered Jan 02 '23 20:01

anky