This is little convoluted, but I'll just show my data
I constructed following dataframe:
Mid_XYZ Mid_YYY Mid_ZZZ Select1 Select2
867 1019.11 1027.64 1022.68 XYZ YYY
873 1018.04 1027.58 1022.81 XYZ ZZZ
I would want to select values from columns based on Select1
and Select2
strings by matching on part of a column name. In a first row, this would be
1019.11
and 1027.64
(column Mid_XYZ
and Mid_YYY
) - because Select1
has string XYZ
and Select2
has string YYY
.
where, in a second row
1018.04
and 1022.81
(column Mid_XYZ
and Mid_ZZZ
)
Later, I plan to store sum of those values in new column. DataFrame will look like this
Mid_XYZ Mid_YYY Mid_ZZZ Select1 Select2 Sum
867 1019.11 1027.64 1022.68 XYZ YYY 2046.75
873 1018.04 1027.58 1022.81 XYZ ZZZ 2040.85
I can change column names to exact matching, but there should be some solution with regex? I know about df.filter(regex='XYZ')
, but how can I do it row-wise?
Use the following vectorized solution:
import numpy as np
# clean rows
clean = df.columns.str.replace('^Mid_', '', regex=True)
# find matching column indices
s1 = np.argmax(clean.values == df['Select1'].values[:, None], axis=1)
s2 = np.argmax(clean.values == df['Select2'].values[:, None], axis=1)
# index and sum
df['Sum'] = df.values[np.arange(len(s1)), s1] + df.values[np.arange(len(s2)), s2]
print(df)
Output
Mid_XYZ Mid_YYY Mid_ZZZ Select1 Select2 Sum
867 1019.11 1027.64 1022.68 XYZ YYY 2046.75
873 1018.04 1027.58 1022.81 XYZ ZZZ 2040.85
import pandas as pd
If you have:
df=pd.DataFrame.from_dict({'Mid_XYZ':[1019.11,1018.04],
'Mid_YYY':[1027.64,1027.58],
'Mid_ZZZ':[1022.68,1022.81],
'Select1':['XYZ','XYZ'],
'Select2':['YYY','ZZZ']})
You can do:
df['Sum']=df.apply(lambda row:
row['Mid_'+row['Select1']]+\
row['Mid_'+row['Select2']],
axis=1)
df
will be:
Mid_XYZ Mid_YYY Mid_ZZZ Select1 Select2 Sum
0 1019.11 1027.64 1022.68 XYZ YYY 2046.75
1 1018.04 1027.58 1022.81 XYZ ZZZ 2040.85
If you don't like lambda
functions, can achieve the same result by defining a function:
def sumfunc(row):
return row['Mid_'+row['Select1']]+row['Mid_'+row['Select2']]
Then:
df['Sum']=df.apply(sumfunc,axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With