Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get first and second highest values in pandas columns

I am using pandas to analyse some election results. I have a DF, Results, which has a row for each constituency and columns representing the votes for the various parties (over 100 of them):

In[60]: Results.columns Out[60]:  Index(['Constituency', 'Region', 'Country', 'ID', 'Type', 'Electorate',        'Total', 'Unnamed: 9', '30-50', 'Above',        ...        'WP', 'WRP', 'WVPTFP', 'Yorks', 'Young', 'Zeb', 'Party', 'Votes',        'Share', 'Turnout'],       dtype='object', length=147)  

So...

In[63]: Results.head() Out[63]:                           Constituency    Region   Country         ID    Type  \ PAID                                                                            1                            Aberavon     Wales     Wales  W07000049  County    2                           Aberconwy     Wales     Wales  W07000058  County    3                      Aberdeen North  Scotland  Scotland  S14000001   Burgh    4                      Aberdeen South  Scotland  Scotland  S14000002   Burgh    5     Aberdeenshire West & Kincardine  Scotland  Scotland  S14000058  County           Electorate  Total  Unnamed: 9  30-50  Above    ...     WP  WRP  WVPTFP  \ PAID                                                 ...                        1          49821  31523         NaN    NaN    NaN    ...    NaN  NaN     NaN    2          45525  30148         NaN    NaN    NaN    ...    NaN  NaN     NaN    3          67745  43936         NaN    NaN    NaN    ...    NaN  NaN     NaN    4          68056  48551         NaN    NaN    NaN    ...    NaN  NaN     NaN    5          73445  55196         NaN    NaN    NaN    ...    NaN  NaN     NaN           Yorks  Young  Zeb  Party  Votes     Share   Turnout   PAID                                                        1       NaN    NaN  NaN    Lab  15416  0.489040  0.632725   2       NaN    NaN  NaN    Con  12513  0.415052  0.662230   3       NaN    NaN  NaN    SNP  24793  0.564298  0.648550   4       NaN    NaN  NaN    SNP  20221  0.416490  0.713398   5       NaN    NaN  NaN    SNP  22949  0.415773  0.751528    [5 rows x 147 columns] 

The per-constituency results for each party are given in the columns Results.ix[:, 'Unnamed: 9': 'Zeb']

I can find the winning party (i.e. the party which polled highest number of votes) and the number of votes it polled using:

RawResults = Results.ix[:, 'Unnamed: 9': 'Zeb'] Results['Party'] = RawResults.idxmax(axis=1) Results['Votes'] = RawResults.max(axis=1).astype(int) 

But, I also need to know how many votes the second-place party got (and ideally its index/name). So is there any way in pandas to return the second highest value/index in a set of columns for each row?

like image 544
TimGJ Avatar asked Aug 21 '16 16:08

TimGJ


People also ask

How do you get top 5 values in pandas?

Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().

How do you find the highest value of a column in a DataFrame?

The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do you get the maximum values of each group in a pandas?

To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.


1 Answers

To get the highest values of a column, you can use nlargest() :

df['High'].nlargest(2) 

The above will give you the 2 highest values of column High.


You can also use nsmallest() to get the lowest values.

like image 84
Pedro Lobito Avatar answered Sep 21 '22 03:09

Pedro Lobito