pandas

Question

I want to populate the empty columns 'web' 'mob 'app' by summing for each of the relevant dates in df2

df1:

id      start       end         web mob app
12345   2018-01-17  2018-01-20
12346   2018-01-19  2018-01-22
12347   2018-01-20  2018-01-23
12348   2018-01-20  2018-01-23
12349   2018-01-21  2018-01-24

df2:

id      date        web mob app
12345   2018-01-17  7   17  10
12345   2018-01-18  9   18  7
12345   2018-01-19  3   19  15
12345   2018-01-20  6   17  8
12345   2018-01-21  8   9   13
12345   2018-01-22  4   15  12
12345   2018-01-23  8   11  13
12345   2018-01-24  9   16  14
12346   2018-01-17  3   17  12
12346   2018-01-18  4   19  4
12346   2018-01-19  6   13  10
12346   2018-01-20  1   15  6
12346   2018-01-21  4   12  11
12346   2018-01-22  5   20  12
12346   2018-01-23  8   13  14
12346   2018-01-24  6   18  8

This for loop will populate the 'web' column:

column = []

for i in df1.index:
    column.append(df2[(df2['date'] >= df1['start'].iloc[i]) 
        & (df2['date'] <= df1['end'].iloc[i]) 
        & (df2['id'] == df1['id'].iloc[i])].sum()['web'])

df1['web'] = column

I want to be able to populate all 3 columns with one for loop, rather than doing 3 separate loops.

I have a feeling that using something like appending this

.agg({'web':'sum', 'mob':'sum', 'app':'sum'})

to a 2 dimensional list could be the answer.

Also... is there a more efficient way to do this than using for loops? Maybe by using numpy.where? I'm finding that running multiple for loops over large data sets can be very very slow.

BENY · Accepted Answer

IIUC

s=df1.merge(df2,on='id',how='left')
output=s[(s.start<=s.date)&(s.end>=s.date)].groupby('id').sum()
output
Out[991]: 
        web   mob   app
id                     
12345  25.0  71.0  40.0
12346  16.0  60.0  39.0

Then we using merge again

df1.merge(output.reset_index(),how='left').fillna(0)
Out[995]: 
      id      start        end   web   mob   app
0  12345 2018-01-17 2018-01-20  25.0  71.0  40.0
1  12346 2018-01-19 2018-01-22  16.0  60.0  39.0
2  12347 2018-01-20 2018-01-23   0.0   0.0   0.0
3  12348 2018-01-20 2018-01-23   0.0   0.0   0.0
4  12349 2018-01-21 2018-01-24   0.0   0.0   0.0

pandas - using a for loop to append multiple columns to a dataframe

Tags:

python

for-loop

numpy

Ben Sharkey

1 Answers

BENY

Recent Activity

Donate For Us