How to calculate ratio of values in a pandas dataframe column?

Tags:

pandas

I'm new to pandas and decided to learn it by playing around with some data I pulled from my favorite game's API. I have a dataframe with two columns "playerId" and "winner" like so:

playerStatus:
______________________
   playerId   winner
0    1848      True
1    1988      False
2    3543      True
3    1848      False
4    1988      False
...

Each row represents a match the player participated in. My goal is to either transform this dataframe or create a new one such that the win percentage for each playerId is calculated. For example, the above dataframe would become:

playerWinsAndTotals
_________________________________________
   playerId   wins  totalPlayed   winPct
0    1848      1        2         50.0000
1    1988      0        2         0.0000
2    3543      1        1         100.0000
...

It took quite a while of reading pandas docs, but I actually managed to achieve this by essentially creating two different tables (one to find the number of wins for each player, one to find the total games for each player), and merging them, then taking the ratio of wins to games played.

Creating the "wins" dataframe:

temp_df = playerStatus[['playerId', 'winner']].value_counts().reset_index(name='wins')
onlyWins = temp_df[temp_df['winner'] == True][['playerId', 'wins']]
onlyWins
_________________________
    playerId    wins
1     1670       483
3     1748       474
4     2179       468
6     4006       434
8     1668       392
...

Creating the "totals" dataframe:

totalPlayed = playerStatus['playerId'].value_counts().reset_index(name='totalCount').rename(columns={'index': 'playerId'})
totalPlayed
____________________

   playerId   totalCount
0    1670        961
1    1748        919
2    1872        877
3    4006        839
4    2179        837
...

Finally, merging them and adding the "winPct" column.

playerWinsAndTotals = onlyWins.merge(totalPlayed, on='playerId', how='left')
playerWinsAndTotals['winPct'] = playerWinsAndTotals['wins']/playerWinsAndTotals['totalCount'] * 100
playerWinsAndTotals
_____________________________________________

   playerId   wins   totalCount     winPct
0    1670      483      961       50.260146
1    1748      474      919       51.577802
2    2179      468      837       55.913978
3    4006      434      839       51.728248
4    1668      392      712       55.056180
...

Now, the reason I am posting this here is because I know I'm not taking full advantage of what pandas has to offer. Creating and merging two different dataframes just to find the ratio of player wins seems unnecessary. I feel like I took the "scenic" route on this one.

To anyone more experienced than me, how would you tackle this problem?

419

asked Aug 16 '21 01:08

Bankst

1 Answers

We can take advantage of the way that Boolean values are handled mathematically (True being 1 and False being 0) and use 3 aggregation functions sum, count and mean per group (groupby aggregate). We can also take advantage of Named Aggregation to both create and rename the columns in one step:

df = (
    df.groupby('playerId', as_index=False)
        .agg(wins=('winner', 'sum'),
             totalCount=('winner', 'count'),
             winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100

df:

   playerId  wins  totalCount  winPct
0      1848     1           2    50.0
1      1988     0           2     0.0
2      3543     1           1   100.0

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    'playerId': [1848, 1988, 3543, 1848, 1988],
    'winner': [True, False, True, False, False]
})

147

answered Oct 26 '22 22:10

Henry Ecker

Related questions
                            
                                NSWindow drag regions should only be invalidated on the Main Thread! This will throw an exception in the future
                            
                                How to concatenate a vector into rows of a numpy matrix?
                            
                                Perform sum over different slice of each row for 2D array
                            
                                Simple way to delete existing pods from Python
                            
                                AttributeError: module 'google.cloud.vision' has no attribute 'types'
                            
                                Sendgrid Authenticate with API Keys
                            
                                Pytorch RuntimeError: expected scalar type Float but found Byte
                            
                                What exactly is Keras's CategoricalCrossEntropy doing?
                            
                                Python, Avoid ugly nested for loop
                            
                                Google Ads API - "failed with status "PERMISSION_DENIED" - "User doesn't have permission to access customer."
                            
                                Django: What's the difference between Queryset.union() and the OR operator?
                            
                                With BERT Text Classification, ValueError: too many dimensions 'str' error occuring
                            
                                Example code from typing library causes TypeError: 'type' object is not subscriptable, why?
                            
                                Python regex to match 6-digit numbers of different formats
                            
                                How to efficiently perform addition over large loops in python
                            
                                Getting ImportError when using torchtext
                            
                                ImportError: cannot import name '_ColumnEntity' Ubuntu20.10 [duplicate]
                            
                                Regex to group words separated by space
                            
                                Python: Sorting items from top left to bottom right with OpenCV
                            
                                Pandas percentage change using group by

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With