Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the column index which has the maximum value for each row

Tags:

pandas

argmax

I have below data frame:

   Name1 Scr1 Name2 Scr2 Name3 Scr3
   NY    21   CA    45   SF    37
   AZ    31   BK    46   AK    23

I am trying to get the maximum value of each row and corresponding name for each row:

df.idxmax(axis=1)

But how do i get the corresponding name as well?

Expected Output:

   Name Hi_Scr
   CA    45
   BK    46
like image 477
msksantosh Avatar asked Jan 04 '18 11:01

msksantosh


People also ask

How do you find the columns maximum value in every row?

To create the new column 'Max', use df['Max'] = df. idxmax(axis=1) . To find the row index at which the maximum value occurs in each column, use df. idxmax() (or equivalently df.

How do you find the maximum value of each row in a Dataframe?

To find the maximum value of each row, call the max() method on the Dataframe object with an argument axis = 1.

How do you find the index of a maximum value in a column in R?

We can find the maximum value index in a dataframe using the which. max() function. “$” is used to access particular column of a dataframe.

How to find the maximum value of each column in Dataframe?

To find the maximum value of each column, call max () method on the Dataframe object without taking any argument. We can see that it returned a series of maximum values where the index is column name and values are the maxima from each column. How to find maximum values of every row?

How to get the index of maximum values from pandas Dataframe?

Observe this dataset first. We’ll use ‘Weight’ and ‘Salary’ columns of this data in order to get the index of maximum values from a particular column in Pandas DataFrame. Code #1: Check the index at which maximum weight value is present.

How to find the highest value in a row in Excel?

Find highest value in a row and return column header with formula. 2. And then select the cell and drag the fill handle over to the range that you want to contain this formula, see screenshot: Note: In the above formula: B1: F1 is the headers row that you want to return, B2: F2 is the data range which contains the largest value you want to find.

How to retrieve the column header of the largest value in Excel?

To retrieve the column header of the largest value in a row, you can apply a combination of INDEX, MATCH and MAX functions to get the result. Please do as follows: 1.


Video Answer


3 Answers

I would do it with pd.wide_to_long like this :

df['id'] = df.index
ndf = pd.wide_to_long(df, ["Name", "Scr"], i="id", j="number").reset_index(0).set_index('Name')

#       id  Scr
# Name         
# NY     0   21
# AZ     1   31
# CA     0   45
# BK     1   46
# SF     0   37
# AK     1   23

# Thank you @jezrael for the improvement
ndf.groupby('id')['Scr'].agg(['max','idxmax']).rename(columns= {'max':'Hi_Scr','idxmax':'Name'})

   Name  Hi Scr
id             
0    CA      45
1    BK      46
like image 74
Bharath Avatar answered Oct 17 '22 03:10

Bharath


Use:

  • filter columns with Scr by filter, convert values to numpy array by values
  • get indices of max values with argmax
  • filter columns with Name and select by indexing
  • get max values of numeric
  • create DataFrame by constructor

a = df.filter(like='Scr').values
b = a.argmax(axis=1)
c = df.filter(like='Name').values[np.arange(len(df.index)), b]
d = a.max(axis=1)

df = pd.DataFrame({'Name':c, 'Hi_Scr':d}, columns=['Name','Hi_Scr'])
print (df)
  Name  Hi_Scr
0   CA      45
1   BK      46

Pandas solution is very similar - create MultiIndex in columns by extract, then select by xs and for looking values use lookup:

a = df.columns.to_series().str.extract('(\D+)(\d+)', expand=False)
df.columns = pd.MultiIndex.from_tuples(a.values.tolist())

a = df.xs('Scr', axis=1)
b = a.idxmax(axis=1)
c = df.xs('Name', axis=1).lookup(df.index, b)
d = a.max(axis=1)

df = pd.DataFrame({'Name':c, 'Hi_Scr':d}, columns=['Name','Hi_Scr'])
print (df)
  Name  Hi_Scr
0   CA      45
1   BK      46

Timings:

df = pd.concat([df]*10000).reset_index(drop=True)


def jez2(df):
    a = df.columns.to_series().str.extract('(\D+)(\d+)', expand=False)
    df.columns = pd.MultiIndex.from_tuples(a.values.tolist())

    a = df.xs('Scr', axis=1)
    b = a.idxmax(axis=1)
    c = df.xs('Name', axis=1).lookup(df.index, b)
    d = a.max(axis=1)

    return pd.DataFrame({'Name':c, 'Hi_Scr':d}, columns=['Name','Hi_Scr'])


def jez1(df):
    a = df.filter(like='Scr').values
    b = a.argmax(axis=1)
    c = df.filter(like='Name').values[np.arange(len(df.index)), b]
    d = a.max(axis=1)

    return  pd.DataFrame({'Name':c, 'Hi_Scr':d}, columns=['Name','Hi_Scr'])

def dark(df):
    df['id'] = df.index
    ndf = pd.wide_to_long(df, ["Name", "Scr"], i="id", j="number").reset_index(0).set_index('Name')
    return ndf.groupby('id')['Scr'].agg(['max','idxmax']).rename(columns= {'max':'Hi_Scr','idxmax':'Name'})

import time

t0 = time.time()
print (jez1(df).head())
t1 = time.time() - t0
print (t1)
print (dark(df).head())
t2 = time.time() - t1
print (t2)
print (jez2(df).head())
t3 = time.time() - t2
print (t3)

  Name  Hi_Scr
0   CA      45
1   BK      46
2   CA      45
3   BK      46
4   CA      45
#jez1 solution
0.015599966049194336
    Hi_Scr Name
id             
0       45   CA
1       46   BK
2       45   CA
3       46   BK
4       45   CA
#dark solution
1515070100.961423
  Name  Hi_Scr
0   CA      45
1   BK      46
2   CA      45
3   BK      46
4   CA      45
#jez2 solution
0.04679989814758301
like image 5
jezrael Avatar answered Oct 17 '22 03:10

jezrael


Something like

df1=df.select_dtypes(include=[object])
df2=df.select_dtypes(exclude=[object])
pd.DataFrame({'Name':df1.values[np.where(df2.eq(df2.max(1),0))],'Scr':df2.max(1)})

Out[342]: 
  Name  Scr
0   CA   45
1   BK   46
like image 3
BENY Avatar answered Oct 17 '22 04:10

BENY