Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas second largest value's column name

I am trying to find column name associated with the largest and second largest values in a DataFrame, here's a simplified example (the real one has over 500 columns):

Date  val1  val2 val3 val4
1990   5     7    1    10
1991   2     1    10   3
1992   10    9    6    1
1993   50    10   2    15
1994   1     15   7    8

Needs to become:

Date  1larg   2larg
1990  val4    val2
1991  val3    val4
1992  val1    val2
1993  val1    val4
1994  val2    val4

I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?

like image 994
AtotheSiv Avatar asked Sep 24 '14 11:09

AtotheSiv


People also ask

How do I get to the second column in pandas?

columns[2] then df[df. columns[2]] will return all columns of that name and is a dataframe, not a series object.

How will you find the top 5 records of a DataFrame?

head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.

What does size () do in pandas?

The size property returns the number of elements in the DataFrame. The number of elements is the number of rows * the number of columns.

How do I see maximum columns in pandas?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.


1 Answers

(You don't have any duplicate maximum values in your rows, so I'll guess that if you have [1,1,2,2] you want val3 and val4 to be selected.)

One way would be to use the result of argsort as an index into a Series with the column names.

df = df.set_index("Date")
arank = df.apply(np.argsort, axis=1)
ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]]
new_frame = pd.DataFrame(ranked_cols, index=df.index)

produces

         0     1
Date            
1990  val4  val2
1991  val3  val4
1992  val1  val2
1993  val1  val4
1994  val2  val4
1995  val4  val3

(where I've added an extra 1995 [1,1,2,2] row.)

Alternatively, you could probably melt into a flat format, pick out the largest two values in each Date group, and then turn it again.

like image 129
DSM Avatar answered Oct 04 '22 06:10

DSM