for below dataframe data:
x y a b c
2 6 12 1 2
1 2 4 6 8
I want result in new column(i.e d) that returns name of column with max value only among a,b,c.
cols
a
c
I'm trying to find maximum values from three columns and return column name.But instead of selecting all the rows of dataset,I want to select rows of only these three columns.I'm using the following code:
def returncolname(row, colnames):
return colnames[np.argmax(row.values)]
data['colmax'] = data.apply(lambda x: returncolname(x, data.columns), axis=1)
The fastest solution I can think of is DataFrame.dot:
df.eq(df.max(1), axis=0).dot(df.columns)
Details
First, compute the maximum per row:
df.max(1)
0 12
1 8
dtype: int64
Next, find the positions these values come from:
df.eq(df.max(1), axis=0)
x y a b c
0 False False True False False
1 False False False False True
I use eq to make sure the comparison is broadcasted correctly across columns.
Next, compute the dot product with the column list:
df.eq(df.max(1), axis=0).dot(df.columns)
0 a
1 c
dtype: object
If the max is not unique, use
df.eq(df.max(1), axis=0).dot(df.columns + ',').str.rstrip(',')
To get a comma separated list of columns. For example,
Change a couple values:
df.at[0, 'c'] = 12
df.at[1, 'y'] = 8
Everything is the same, but notice I append a comma to every column:
df.columns + ','
Index(['x,', 'y,', 'a,', 'b,', 'c,'], dtype='object')
df.eq(df.max(1), axis=0).dot(df.columns + ',')
0 a,c,
1 y,c,
dtype: object
From this, strip any trailing commas:
df.eq(df.max(1), axis=0).dot(df.columns + ',').str.rstrip(',')
0 a,c
1 y,c
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With