Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

inserting missing record with values as zero in grouped data in pandas

Tags:

python

pandas

I have a dataframe df:

import pandas as pd
s = {'id': [243,243, 243, 243, 443,443,443, 332,334,332,332, 333],
 'col':[1,1,1,1,1,1,1,2,2,2,2,2],
 'st': [1,3,5,9,12, 18,23, 1,2,4,8,14],
 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1, 2.0, 2.5, 3.4, 1.2, 2.4]}
df = pd.DataFrame(s)

It looks like:

id      col  st  value
0   243    1   1    2.4
1   243    1   3    3.8
2   243    1   5    3.7
3   243    1   9    5.6
4   443    1  12    1.2
5   443    1  18    0.2
6   443    1  23    2.1
7   332    2   1    2.0
8   334    2   2    2.5
9   332    2   4    3.4
10  332    2   8    1.2
11  333    2  14    2.4

The data have two groups col 1 and 2(in real data many groups). I want to include the missing records on the basis of the st column. and the values must be kept as 0.

My output must look like

id  col  st  value
243    1   1    2.4
0      1   2     0
243    1   3    3.8
0      1   4     0
243    1   5    3.7

and so on

332    2   1    2.0
334    2   2    2.5
0      2   3     0
332    2   4    3.4
0      2   5     0
0      2   6     0
0      2   7     0
332    2   8    1.2

How can I do this in pandas ?

like image 524
Archit Avatar asked Nov 28 '25 02:11

Archit


2 Answers

Use DataFrame.reindex per groups with GroupBy.apply and range:

df = (df.set_index('st')
        .groupby('col')['id','value']
        .apply(lambda x: x.reindex(range(x.index.min(), x.index.max() + 1), fill_value=0))
        .reset_index()
       )

print (df)
    col  st   id  value
0     1   1  243    2.4
1     1   2    0    0.0
2     1   3  243    3.8
3     1   4    0    0.0
4     1   5  243    3.7
5     1   6    0    0.0
6     1   7    0    0.0
7     1   8    0    0.0
8     1   9  243    5.6
9     1  10    0    0.0
10    1  11    0    0.0
11    1  12  443    1.2
12    1  13    0    0.0
13    1  14    0    0.0
14    1  15    0    0.0
15    1  16    0    0.0
16    1  17    0    0.0
17    1  18  443    0.2
18    1  19    0    0.0
19    1  20    0    0.0
20    1  21    0    0.0
21    1  22    0    0.0
22    1  23  443    2.1
23    2   1  332    2.0
24    2   2  334    2.5
25    2   3    0    0.0
26    2   4  332    3.4
27    2   5    0    0.0
28    2   6    0    0.0
29    2   7    0    0.0
30    2   8  332    1.2
31    2   9    0    0.0
32    2  10    0    0.0
33    2  11    0    0.0
34    2  12    0    0.0
35    2  13    0    0.0
36    2  14  333    2.4
like image 183
jezrael Avatar answered Nov 29 '25 17:11

jezrael


Method using unnesting , first create the range by using groupby + agg , then we just need explode it and merge

s=df.groupby(['id','col']).st.agg(['min','max'])
s['st']=[ list(range(x,y+1)) for x , y in zip(s['min'],s['max'])]
newdf=unnesting(s.drop(['min','max'],1).reset_index(),['st']).merge(df,how='left').fillna(0)

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')
like image 31
BENY Avatar answered Nov 29 '25 18:11

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!