inserting missing record with values as zero in grouped data in pandas

Question

I have a dataframe df:

import pandas as pd
s = {'id': [243,243, 243, 243, 443,443,443, 332,334,332,332, 333],
 'col':[1,1,1,1,1,1,1,2,2,2,2,2],
 'st': [1,3,5,9,12, 18,23, 1,2,4,8,14],
 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1, 2.0, 2.5, 3.4, 1.2, 2.4]}
df = pd.DataFrame(s)

It looks like:

id      col  st  value
0   243    1   1    2.4
1   243    1   3    3.8
2   243    1   5    3.7
3   243    1   9    5.6
4   443    1  12    1.2
5   443    1  18    0.2
6   443    1  23    2.1
7   332    2   1    2.0
8   334    2   2    2.5
9   332    2   4    3.4
10  332    2   8    1.2
11  333    2  14    2.4

The data have two groups col 1 and 2(in real data many groups). I want to include the missing records on the basis of the st column. and the values must be kept as 0.

My output must look like

id  col  st  value
243    1   1    2.4
0      1   2     0
243    1   3    3.8
0      1   4     0
243    1   5    3.7

and so on

332    2   1    2.0
334    2   2    2.5
0      2   3     0
332    2   4    3.4
0      2   5     0
0      2   6     0
0      2   7     0
332    2   8    1.2

How can I do this in pandas ?

jezrael · Accepted Answer

Use DataFrame.reindex per groups with GroupBy.apply and range:

df = (df.set_index('st')
        .groupby('col')['id','value']
        .apply(lambda x: x.reindex(range(x.index.min(), x.index.max() + 1), fill_value=0))
        .reset_index()
       )

print (df)
    col  st   id  value
0     1   1  243    2.4
1     1   2    0    0.0
2     1   3  243    3.8
3     1   4    0    0.0
4     1   5  243    3.7
5     1   6    0    0.0
6     1   7    0    0.0
7     1   8    0    0.0
8     1   9  243    5.6
9     1  10    0    0.0
10    1  11    0    0.0
11    1  12  443    1.2
12    1  13    0    0.0
13    1  14    0    0.0
14    1  15    0    0.0
15    1  16    0    0.0
16    1  17    0    0.0
17    1  18  443    0.2
18    1  19    0    0.0
19    1  20    0    0.0
20    1  21    0    0.0
21    1  22    0    0.0
22    1  23  443    2.1
23    2   1  332    2.0
24    2   2  334    2.5
25    2   3    0    0.0
26    2   4  332    3.4
27    2   5    0    0.0
28    2   6    0    0.0
29    2   7    0    0.0
30    2   8  332    1.2
31    2   9    0    0.0
32    2  10    0    0.0
33    2  11    0    0.0
34    2  12    0    0.0
35    2  13    0    0.0
36    2  14  333    2.4

BENY · Answer

Method using unnesting , first create the range by using groupby + agg , then we just need explode it and merge

s=df.groupby(['id','col']).st.agg(['min','max'])
s['st']=[ list(range(x,y+1)) for x , y in zip(s['min'],s['max'])]
newdf=unnesting(s.drop(['min','max'],1).reset_index(),['st']).merge(df,how='left').fillna(0)

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

inserting missing record with values as zero in grouped data in pandas

Tags:

python

pandas

Archit

2 Answers

jezrael

BENY

Recent Activity

Donate For Us

inserting missing record with values as zero in grouped data in pandas

Tags:

python

pandas

Archit

2 Answers

jezrael

BENY

Related questions

Recent Activity

Donate For Us