I need to make my code faster. The problem is very simple, but I'm not finding a good way to make the calculation without looping through the whole DataFrame.
I've got three dataFrames: A, B and C.
A and B have 3 columns each, and the following format:
A (10 rows):
Canal Gerencia grad
0 'ABC' 'DEF' 23
etc...
B (25 rows):
Marca Formato grad
0 'GHI' 'JKL' 43
etc...
DataFrame C, on the other hand, has 5 columns:
C (5000 rows):
Marca Formato Canal Gerencia grad
0 'GHI' 'JKL' 'ABC' 'DEF' -102
etc...
I need a vector with the same length of DataFrame 'C' that adds up the values of 'grad' from the three tables, for example:
m = 'GHI'
f = 'JKL'
c = 'ABC'
g = 'DEF'
res = C['grad'][C['Marca']==m][C['Formato']==f][C['Canal']==c][C['Gerencia']==g] + A['grad'][A['Canal']==c][A['Gerencia']==g] + B['grad'][B['Formato']==f][B['Marca']==m]
>>-36
I tried looping through the C dataFrame, but is too slow. I understand I should try to avoid the loop through the dataFrame, but don't know how to do this. My actual code is the following (works, but VERY slow):
res=[]
for row_index, row in C.iterrows():
vec1 = A['Gerencia']==row['Gerencia']
vec2 = A['Canal']==row['Canal']
vec3 = B['Marca']==row['Marca']
vec4 = B['Formato']==row['Formato']
grad = row['grad']
res.append(grad + sum(A['grad'][vec1][vec2])+ sum(B['grad'][vec3][vec4]))
I would really appreciate any help on making this routine quicker. Thank you!
IIUC, you need to merge C
with A
:
C = pd.merge(C, A, on=['Canal', 'Gerencia'])
(this will add a column to it) and then merge the result with B
:
C = pd.merge(C, B, on=['Marca', 'Formato'])
(again adding a column to C
)
At this point, check C
for the names of the columns; say they are grad_foo
, grad_bar
, grad_baz
. So just add them
C.grad_foo + C.grad_bar + C.grad_baz
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With