I have a data frame with columns car_x and car1_y, van2_x and van2_y, and bus3_x and bus3_y. I need a column that is car1_x * car1_y + van2_x * van2_y + bus3_x * bus3_y
The following code doesn't work:
modes = 'car', 'van', 'bus'
for mode in modes:
df['{var}'] = df['{var}_x']*df['{var}_y']
I would then just sum across df['car'], df['van'] and df['bus'] but the syntax above is off.
To fix your code, you'd need to use f-strings in order to let python know that {var} should be inserted as its value and not the string "{var}".
for mode in modes:
df[f'{var}'] = df[f'{var}_x'] * df[f'{var}_y']
But this would need an additional sum step to get "result".
df['result'] = df[list(modes)].sum(axis=1)
Let's cut out the extra step and do this a lot faster, using einsum here. Filter out your _x and _y columns, and then use einsum to specify a sum-of-products operation.
x = df.filter(like='_x')
y = df.filter(like='_y')
df['result'] = np.einsum('ij,ij->i', x, y)
Thanks to the filter step, there is no longer a need to maintain a separate modes list anymore.
I will using groupby
df.groupby(df.columns.str.split('_').str[0],axis=1).prod()[['car', 'van', 'bus']].sum(1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With