Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the difference between the max value and 2nd highest value within a subset of pandas columns

I have a fairly large dataframe:

A B C D
0 17 36 45 54
1 18 23 17 17
2 74 47 8 46
3 48 38 96 83

I am trying to create a new column that is the (max value of the columns) - (2nd highest value) / (2nd highest value).

In this example it would look something like:

A B C D Diff
0 17 36 45 54 .20
1 18 23 17 17 .28
2 74 47 8 46 .57
3 48 38 96 83 .16

I've tried df['diff'] = df.loc[:, 'A': 'D'].max(axis=1) - df.iloc[:df.index.get_loc(df.loc[:, 'A': 'D'].idxmax(axis=1))] / ...

but even that part of the formula returns an error, nevermind including the final division. I'm sure there must be an easier way going about this.

Edit: Additionally, I am also trying to get the difference between the max value and the column that immediately precedes the max value. I know this is a somewhat different question, but I would appreciate any insight. Thank you!

like image 853
Rick Batra Avatar asked Feb 16 '21 04:02

Rick Batra


People also ask

What does diff () do in Pandas?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

What does diff () do in Python?

diff() is used to find the first discrete difference of objects over the given axis. We can provide a period value to shift for forming the difference. axis : Take difference over rows (0) or columns (1).

How to get the maximum value in a pandas column?

To get the maximum value in a pandas column, use the max () function as follows. For example, let’s get the maximum value achieved in the first attempt. We get 87.03 meters as the maximum distance thrown in the “Attemp1” Note that you can get the index corresponding to the max value with the pandas idxmax () function.

How do you get the maximum value in a Dataframe?

3. Max value for each column in the dataframe Similarly, you can get the max value for each column in the dataframe. Apply the max function over the entire dataframe instead of a single column or a selection of columns. For example, We get the maximum values in each column of the dataframe df.

How to get the max value of two or more columns?

An example of how to get the max value of two or more columns in a pandas dataframe ? To get the max value between the columns ['c1','c2','c3'] a solution is to use pandas.DataFrame.max:

What is the maximum distance thrown in the pandas “attemp1”?

We get 87.03 meters as the maximum distance thrown in the “Attemp1” Note that you can get the index corresponding to the max value with the pandas idxmax () function. Let’s get the name of the athlete who threw the longest in the first attempt with this index. You can see that the max value corresponds to “Neeraj Chopra”. 2.


Video Answer


2 Answers

One way using pandas.Series.nlargest with pct_change:

df["Diff"] = df.apply(lambda x: x.nlargest(2).pct_change(-1)[0], axis=1)

Output:

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627
like image 185
Chris Avatar answered Sep 28 '22 08:09

Chris


One way is to apply a udf:

def get_pct(x):
    xmax2, xmax = x.sort_values().tail(2)
    return (xmax-xmax2)/xmax2

df['Diff'] = df.apply(get_pct, axis=1)

Output:

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627
like image 41
Quang Hoang Avatar answered Sep 28 '22 07:09

Quang Hoang