I am trying to find column name associated with the largest and second largest values in a DataFrame, here's a simplified example (the real one has over 500 columns):
Date val1 val2 val3 val4
1990 5 7 1 10
1991 2 1 10 3
1992 10 9 6 1
1993 50 10 2 15
1994 1 15 7 8
Needs to become:
Date 1larg 2larg
1990 val4 val2
1991 val3 val4
1992 val1 val2
1993 val1 val4
1994 val2 val4
I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?
columns[2] then df[df. columns[2]] will return all columns of that name and is a dataframe, not a series object.
head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.
The size property returns the number of elements in the DataFrame. The number of elements is the number of rows * the number of columns.
Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.
(You don't have any duplicate maximum values in your rows, so I'll guess that if you have [1,1,2,2]
you want val3
and val4
to be selected.)
One way would be to use the result of argsort
as an index into a Series with the column names.
df = df.set_index("Date")
arank = df.apply(np.argsort, axis=1)
ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]]
new_frame = pd.DataFrame(ranked_cols, index=df.index)
produces
0 1
Date
1990 val4 val2
1991 val3 val4
1992 val1 val2
1993 val1 val4
1994 val2 val4
1995 val4 val3
(where I've added an extra 1995 [1,1,2,2]
row.)
Alternatively, you could probably melt
into a flat format, pick out the largest two values in each Date group, and then turn it again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With