<p>I have a fairly large dataframe:</p> <div class="s-table-container"> <table class="s-table"> <thead><tr> <th style="text-align: left;"></th> <th>A</th> <th>B</th> <th>C</th> <th>D</th> </tr></thead> <tbody> <tr> <td style="text-align: left;">0</td> <td>17</td> <td>36</td> <td>45</td> <td>54</td> </tr> <tr> <td style="text-align: left;">1</td> <td>18</td> <td>23</td> <td>17</td> <td>17</td> </tr> <tr> <td style="text-align: left;">2</td> <td>74</td> <td>47</td> <td>8</td> <td>46</td> </tr> <tr> <td style="text-align: left;">3</td> <td>48</td> <td>38</td> <td>96</td> <td>83</td> </tr> </tbody> </table> </div> <p>I am trying to create a new column that is the (max value of the columns) - (2nd highest value) / (2nd highest value).</p> <p>In this example it would look something like:</p> <div class="s-table-container"> <table class="s-table"> <thead><tr> <th style="text-align: left;"></th> <th>A</th> <th>B</th> <th>C</th> <th>D</th> <th>Diff</th> </tr></thead> <tbody> <tr> <td style="text-align: left;">0</td> <td>17</td> <td>36</td> <td>45</td> <td>54</td> <td>.20</td> </tr> <tr> <td style="text-align: left;">1</td> <td>18</td> <td>23</td> <td>17</td> <td>17</td> <td>.28</td> </tr> <tr> <td style="text-align: left;">2</td> <td>74</td> <td>47</td> <td>8</td> <td>46</td> <td>.57</td> </tr> <tr> <td style="text-align: left;">3</td> <td>48</td> <td>38</td> <td>96</td> <td>83</td> <td>.16</td> </tr> </tbody> </table> </div> <p>I've tried df['diff'] = df.loc[:, 'A': 'D'].max(axis=1) - df.iloc[:df.index.get_loc(df.loc[:, 'A': 'D'].idxmax(axis=1))] / ...</p> <p>but even that part of the formula returns an error, nevermind including the final division. I'm sure there must be an easier way going about this.</p> <p>Edit: Additionally, I am also trying to get the difference between the max value and the column that immediately precedes the max value. I know this is a somewhat different question, but I would appreciate any insight. Thank you!</p>

<p>One way using <code>pandas.Series.nlargest</code> with <code>pct_change</code>:</p> <pre class="prettyprint"><code>df["Diff"] = df.apply(lambda x: x.nlargest(2).pct_change(-1)[0], axis=1) </code></pre> <p>Output:</p> <pre class="prettyprint"><code> A B C D Diff 0 17 36 45 54 0.200000 1 18 23 17 17 0.277778 2 74 47 8 46 0.574468 3 48 38 96 83 0.156627 </code></pre>

<p>One way is to apply a udf:</p> <pre class="prettyprint"><code>def get_pct(x): xmax2, xmax = x.sort_values().tail(2) return (xmax-xmax2)/xmax2 df['Diff'] = df.apply(get_pct, axis=1) </code></pre> <p>Output:</p> <pre class="prettyprint"><code> A B C D Diff 0 17 36 45 54 0.200000 1 18 23 17 17 0.277778 2 74 47 8 46 0.574468 3 48 38 96 83 0.156627 </code></pre>

Find the difference between the max value and 2nd highest value within a subset of pandas columns

Tags:

python

pandas

numpy

I have a fairly large dataframe:

	A	B	C	D
0	17	36	45	54
1	18	23	17	17
2	74	47	8	46
3	48	38	96	83

I am trying to create a new column that is the (max value of the columns) - (2nd highest value) / (2nd highest value).

In this example it would look something like:

	A	B	C	D	Diff
0	17	36	45	54	.20
1	18	23	17	17	.28
2	74	47	8	46	.57
3	48	38	96	83	.16

I've tried df['diff'] = df.loc[:, 'A': 'D'].max(axis=1) - df.iloc[:df.index.get_loc(df.loc[:, 'A': 'D'].idxmax(axis=1))] / ...

but even that part of the formula returns an error, nevermind including the final division. I'm sure there must be an easier way going about this.

Edit: Additionally, I am also trying to get the difference between the max value and the column that immediately precedes the max value. I know this is a somewhat different question, but I would appreciate any insight. Thank you!

853

asked Feb 16 '21 04:02

Rick Batra

Video Answer

2 Answers

One way using pandas.Series.nlargest with pct_change:

df["Diff"] = df.apply(lambda x: x.nlargest(2).pct_change(-1)[0], axis=1)

Output:

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627

185

answered Sep 28 '22 08:09

Chris

One way is to apply a udf:

def get_pct(x):
    xmax2, xmax = x.sort_values().tail(2)
    return (xmax-xmax2)/xmax2

df['Diff'] = df.apply(get_pct, axis=1)

Output:

    A   B   C   D      Diff
0  17  36  45  54  0.200000
1  18  23  17  17  0.277778
2  74  47   8  46  0.574468
3  48  38  96  83  0.156627

answered Sep 28 '22 07:09

Quang Hoang

Related questions
                            
                                What is the equivalent of decorators with arguments without the syntactical-sugar?
                            
                                What is the use of pd.plotting.register_matplotlib_converters() in Pandas
                            
                                How can I check a file is closed or not in python?
                            
                                Why is multiprocessing slower here?
                            
                                function of `with` in `concurrent.futures`
                            
                                How to join strings between parentheses in a list of strings
                            
                                Setting up coc.nvim for python
                            
                                Display Pytorch tensor as image using Matplotlib
                            
                                How to mock `name` attribute with unittest.mock.MagicMock or Mock classes?
                            
                                Attempting to run RPY2 in Python and receiving error 0X7e
                            
                                Seaborn violinplot transparency
                            
                                Contour (iso-z) or threshold lines in seaborn heatmap
                            
                                Return value from list according to index number
                            
                                Can't make a virtual env in PyCharm using a WSL Python interpreter
                            
                                Numpy split array into chunks of equal size with remainder
                            
                                How do you broadcast np.random.choice across each row of a numpy array?
                            
                                How to generate a Blob signed url in Google Cloud Run?
                            
                                In JSON created from a pydantic.BaseModel exclude Optional if not set
                            
                                norm.ppf vs norm.cdf in python's scipy.stats
                            
                                Matplotlib plots not showing in VS Code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With