This is little convoluted, but I'll just show my data
I constructed following dataframe:
      Mid_XYZ  Mid_YYY  Mid_ZZZ Select1 Select2
867    1019.11   1027.64  1022.68   XYZ   YYY
873    1018.04   1027.58  1022.81   XYZ   ZZZ
I would want to select values from columns based on Select1 and Select2 strings by matching on part of a column name. In a first row, this would be
1019.11 and 1027.64 (column Mid_XYZ and Mid_YYY) - because Select1 has string XYZ and Select2 has string YYY.
where, in a second row
1018.04 and 1022.81 (column Mid_XYZ and Mid_ZZZ)
Later, I plan to store sum of those values in new column. DataFrame will look like this
      Mid_XYZ  Mid_YYY  Mid_ZZZ Select1 Select2 Sum
867    1019.11   1027.64  1022.68   XYZ   YYY   2046.75
873    1018.04   1027.58  1022.81   XYZ   ZZZ   2040.85
I can change column names to exact matching, but there should be some solution with regex? I know about df.filter(regex='XYZ'), but how can I do it row-wise?
Use the following vectorized solution:
import numpy as np
# clean rows
clean = df.columns.str.replace('^Mid_', '', regex=True)
# find matching column indices
s1 = np.argmax(clean.values == df['Select1'].values[:, None], axis=1)
s2 = np.argmax(clean.values == df['Select2'].values[:, None], axis=1)
# index and sum
df['Sum'] = df.values[np.arange(len(s1)), s1] + df.values[np.arange(len(s2)), s2]
print(df)
Output
     Mid_XYZ  Mid_YYY  Mid_ZZZ Select1 Select2      Sum
867  1019.11  1027.64  1022.68     XYZ     YYY  2046.75
873  1018.04  1027.58  1022.81     XYZ     ZZZ  2040.85
                        import pandas as pd
If you have:
df=pd.DataFrame.from_dict({'Mid_XYZ':[1019.11,1018.04],
                           'Mid_YYY':[1027.64,1027.58],
                           'Mid_ZZZ':[1022.68,1022.81],
                           'Select1':['XYZ','XYZ'],
                           'Select2':['YYY','ZZZ']})
You can do:
df['Sum']=df.apply(lambda row:
                   row['Mid_'+row['Select1']]+\
                   row['Mid_'+row['Select2']],
                   axis=1)
df will be:
   Mid_XYZ  Mid_YYY  Mid_ZZZ Select1 Select2      Sum
0  1019.11  1027.64  1022.68     XYZ     YYY  2046.75
1  1018.04  1027.58  1022.81     XYZ     ZZZ  2040.85
If you don't like lambda functions, can achieve the same result by defining a function:
def sumfunc(row):
    return row['Mid_'+row['Select1']]+row['Mid_'+row['Select2']]
Then:
df['Sum']=df.apply(sumfunc,axis=1)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With