I have a fairly large dataframe:
A | B | C | D | |
---|---|---|---|---|
0 | 17 | 36 | 45 | 54 |
1 | 18 | 23 | 17 | 17 |
2 | 74 | 47 | 8 | 46 |
3 | 48 | 38 | 96 | 83 |
I am trying to create a new column that is the (max value of the columns) - (2nd highest value) / (2nd highest value).
In this example it would look something like:
A | B | C | D | Diff | |
---|---|---|---|---|---|
0 | 17 | 36 | 45 | 54 | .20 |
1 | 18 | 23 | 17 | 17 | .28 |
2 | 74 | 47 | 8 | 46 | .57 |
3 | 48 | 38 | 96 | 83 | .16 |
I've tried df['diff'] = df.loc[:, 'A': 'D'].max(axis=1) - df.iloc[:df.index.get_loc(df.loc[:, 'A': 'D'].idxmax(axis=1))] / ...
but even that part of the formula returns an error, nevermind including the final division. I'm sure there must be an easier way going about this.
Edit: Additionally, I am also trying to get the difference between the max value and the column that immediately precedes the max value. I know this is a somewhat different question, but I would appreciate any insight. Thank you!
The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
diff() is used to find the first discrete difference of objects over the given axis. We can provide a period value to shift for forming the difference. axis : Take difference over rows (0) or columns (1).
To get the maximum value in a pandas column, use the max () function as follows. For example, let’s get the maximum value achieved in the first attempt. We get 87.03 meters as the maximum distance thrown in the “Attemp1” Note that you can get the index corresponding to the max value with the pandas idxmax () function.
3. Max value for each column in the dataframe Similarly, you can get the max value for each column in the dataframe. Apply the max function over the entire dataframe instead of a single column or a selection of columns. For example, We get the maximum values in each column of the dataframe df.
An example of how to get the max value of two or more columns in a pandas dataframe ? To get the max value between the columns ['c1','c2','c3'] a solution is to use pandas.DataFrame.max:
We get 87.03 meters as the maximum distance thrown in the “Attemp1” Note that you can get the index corresponding to the max value with the pandas idxmax () function. Let’s get the name of the athlete who threw the longest in the first attempt with this index. You can see that the max value corresponds to “Neeraj Chopra”. 2.
One way using pandas.Series.nlargest
with pct_change
:
df["Diff"] = df.apply(lambda x: x.nlargest(2).pct_change(-1)[0], axis=1)
Output:
A B C D Diff
0 17 36 45 54 0.200000
1 18 23 17 17 0.277778
2 74 47 8 46 0.574468
3 48 38 96 83 0.156627
One way is to apply a udf:
def get_pct(x):
xmax2, xmax = x.sort_values().tail(2)
return (xmax-xmax2)/xmax2
df['Diff'] = df.apply(get_pct, axis=1)
Output:
A B C D Diff
0 17 36 45 54 0.200000
1 18 23 17 17 0.277778
2 74 47 8 46 0.574468
3 48 38 96 83 0.156627
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With