I have a dataframe df:
import pandas as pd
s = {'id': [243,243, 243, 243, 443,443,443, 332,334,332,332, 333],
'col':[1,1,1,1,1,1,1,2,2,2,2,2],
'st': [1,3,5,9,12, 18,23, 1,2,4,8,14],
'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1, 2.0, 2.5, 3.4, 1.2, 2.4]}
df = pd.DataFrame(s)
It looks like:
id col st value
0 243 1 1 2.4
1 243 1 3 3.8
2 243 1 5 3.7
3 243 1 9 5.6
4 443 1 12 1.2
5 443 1 18 0.2
6 443 1 23 2.1
7 332 2 1 2.0
8 334 2 2 2.5
9 332 2 4 3.4
10 332 2 8 1.2
11 333 2 14 2.4
The data have two groups col 1 and 2(in real data many groups). I want to include the missing records on the basis of the st column. and the values must be kept as 0.
My output must look like
id col st value
243 1 1 2.4
0 1 2 0
243 1 3 3.8
0 1 4 0
243 1 5 3.7
and so on
332 2 1 2.0
334 2 2 2.5
0 2 3 0
332 2 4 3.4
0 2 5 0
0 2 6 0
0 2 7 0
332 2 8 1.2
How can I do this in pandas ?
Use DataFrame.reindex per groups with GroupBy.apply and range:
df = (df.set_index('st')
.groupby('col')['id','value']
.apply(lambda x: x.reindex(range(x.index.min(), x.index.max() + 1), fill_value=0))
.reset_index()
)
print (df)
col st id value
0 1 1 243 2.4
1 1 2 0 0.0
2 1 3 243 3.8
3 1 4 0 0.0
4 1 5 243 3.7
5 1 6 0 0.0
6 1 7 0 0.0
7 1 8 0 0.0
8 1 9 243 5.6
9 1 10 0 0.0
10 1 11 0 0.0
11 1 12 443 1.2
12 1 13 0 0.0
13 1 14 0 0.0
14 1 15 0 0.0
15 1 16 0 0.0
16 1 17 0 0.0
17 1 18 443 0.2
18 1 19 0 0.0
19 1 20 0 0.0
20 1 21 0 0.0
21 1 22 0 0.0
22 1 23 443 2.1
23 2 1 332 2.0
24 2 2 334 2.5
25 2 3 0 0.0
26 2 4 332 3.4
27 2 5 0 0.0
28 2 6 0 0.0
29 2 7 0 0.0
30 2 8 332 1.2
31 2 9 0 0.0
32 2 10 0 0.0
33 2 11 0 0.0
34 2 12 0 0.0
35 2 13 0 0.0
36 2 14 333 2.4
Method using unnesting , first create the range by using groupby + agg , then we just need explode it and merge
s=df.groupby(['id','col']).st.agg(['min','max'])
s['st']=[ list(range(x,y+1)) for x , y in zip(s['min'],s['max'])]
newdf=unnesting(s.drop(['min','max'],1).reset_index(),['st']).merge(df,how='left').fillna(0)
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With