I am using pandas to analyse some election results. I have a DF, Results, which has a row for each constituency and columns representing the votes for the various parties (over 100 of them):
In[60]: Results.columns Out[60]: Index(['Constituency', 'Region', 'Country', 'ID', 'Type', 'Electorate', 'Total', 'Unnamed: 9', '30-50', 'Above', ... 'WP', 'WRP', 'WVPTFP', 'Yorks', 'Young', 'Zeb', 'Party', 'Votes', 'Share', 'Turnout'], dtype='object', length=147)
So...
In[63]: Results.head() Out[63]: Constituency Region Country ID Type \ PAID 1 Aberavon Wales Wales W07000049 County 2 Aberconwy Wales Wales W07000058 County 3 Aberdeen North Scotland Scotland S14000001 Burgh 4 Aberdeen South Scotland Scotland S14000002 Burgh 5 Aberdeenshire West & Kincardine Scotland Scotland S14000058 County Electorate Total Unnamed: 9 30-50 Above ... WP WRP WVPTFP \ PAID ... 1 49821 31523 NaN NaN NaN ... NaN NaN NaN 2 45525 30148 NaN NaN NaN ... NaN NaN NaN 3 67745 43936 NaN NaN NaN ... NaN NaN NaN 4 68056 48551 NaN NaN NaN ... NaN NaN NaN 5 73445 55196 NaN NaN NaN ... NaN NaN NaN Yorks Young Zeb Party Votes Share Turnout PAID 1 NaN NaN NaN Lab 15416 0.489040 0.632725 2 NaN NaN NaN Con 12513 0.415052 0.662230 3 NaN NaN NaN SNP 24793 0.564298 0.648550 4 NaN NaN NaN SNP 20221 0.416490 0.713398 5 NaN NaN NaN SNP 22949 0.415773 0.751528 [5 rows x 147 columns]
The per-constituency results for each party are given in the columns Results.ix[:, 'Unnamed: 9': 'Zeb']
I can find the winning party (i.e. the party which polled highest number of votes) and the number of votes it polled using:
RawResults = Results.ix[:, 'Unnamed: 9': 'Zeb'] Results['Party'] = RawResults.idxmax(axis=1) Results['Votes'] = RawResults.max(axis=1).astype(int)
But, I also need to know how many votes the second-place party got (and ideally its index/name). So is there any way in pandas to return the second highest value/index in a set of columns for each row?
Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().
The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.
To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.
To get the highest values of a column, you can use nlargest() :
df['High'].nlargest(2)
The above will give you the 2 highest values of column High
.
You can also use nsmallest() to get the lowest values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With