I want to populate the empty columns 'web' 'mob 'app' by summing for each of the relevant dates in df2
df1:
id start end web mob app
12345 2018-01-17 2018-01-20
12346 2018-01-19 2018-01-22
12347 2018-01-20 2018-01-23
12348 2018-01-20 2018-01-23
12349 2018-01-21 2018-01-24
df2:
id date web mob app
12345 2018-01-17 7 17 10
12345 2018-01-18 9 18 7
12345 2018-01-19 3 19 15
12345 2018-01-20 6 17 8
12345 2018-01-21 8 9 13
12345 2018-01-22 4 15 12
12345 2018-01-23 8 11 13
12345 2018-01-24 9 16 14
12346 2018-01-17 3 17 12
12346 2018-01-18 4 19 4
12346 2018-01-19 6 13 10
12346 2018-01-20 1 15 6
12346 2018-01-21 4 12 11
12346 2018-01-22 5 20 12
12346 2018-01-23 8 13 14
12346 2018-01-24 6 18 8
This for loop will populate the 'web' column:
column = []
for i in df1.index:
column.append(df2[(df2['date'] >= df1['start'].iloc[i])
& (df2['date'] <= df1['end'].iloc[i])
& (df2['id'] == df1['id'].iloc[i])].sum()['web'])
df1['web'] = column
I want to be able to populate all 3 columns with one for loop, rather than doing 3 separate loops.
I have a feeling that using something like appending this
.agg({'web':'sum', 'mob':'sum', 'app':'sum'})
to a 2 dimensional list could be the answer.
Also... is there a more efficient way to do this than using for loops? Maybe by using numpy.where? I'm finding that running multiple for loops over large data sets can be very very slow.
IIUC
s=df1.merge(df2,on='id',how='left')
output=s[(s.start<=s.date)&(s.end>=s.date)].groupby('id').sum()
output
Out[991]:
web mob app
id
12345 25.0 71.0 40.0
12346 16.0 60.0 39.0
Then we using merge again
df1.merge(output.reset_index(),how='left').fillna(0)
Out[995]:
id start end web mob app
0 12345 2018-01-17 2018-01-20 25.0 71.0 40.0
1 12346 2018-01-19 2018-01-22 16.0 60.0 39.0
2 12347 2018-01-20 2018-01-23 0.0 0.0 0.0
3 12348 2018-01-20 2018-01-23 0.0 0.0 0.0
4 12349 2018-01-21 2018-01-24 0.0 0.0 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With