I'm new to pandas and decided to learn it by playing around with some data I pulled from my favorite game's API. I have a dataframe with two columns "playerId" and "winner" like so:
playerStatus:
______________________
playerId winner
0 1848 True
1 1988 False
2 3543 True
3 1848 False
4 1988 False
...
Each row represents a match the player participated in. My goal is to either transform this dataframe or create a new one such that the win percentage for each playerId is calculated. For example, the above dataframe would become:
playerWinsAndTotals
_________________________________________
playerId wins totalPlayed winPct
0 1848 1 2 50.0000
1 1988 0 2 0.0000
2 3543 1 1 100.0000
...
It took quite a while of reading pandas docs, but I actually managed to achieve this by essentially creating two different tables (one to find the number of wins for each player, one to find the total games for each player), and merging them, then taking the ratio of wins to games played.
Creating the "wins" dataframe:
temp_df = playerStatus[['playerId', 'winner']].value_counts().reset_index(name='wins')
onlyWins = temp_df[temp_df['winner'] == True][['playerId', 'wins']]
onlyWins
_________________________
playerId wins
1 1670 483
3 1748 474
4 2179 468
6 4006 434
8 1668 392
...
Creating the "totals" dataframe:
totalPlayed = playerStatus['playerId'].value_counts().reset_index(name='totalCount').rename(columns={'index': 'playerId'})
totalPlayed
____________________
playerId totalCount
0 1670 961
1 1748 919
2 1872 877
3 4006 839
4 2179 837
...
Finally, merging them and adding the "winPct" column.
playerWinsAndTotals = onlyWins.merge(totalPlayed, on='playerId', how='left')
playerWinsAndTotals['winPct'] = playerWinsAndTotals['wins']/playerWinsAndTotals['totalCount'] * 100
playerWinsAndTotals
_____________________________________________
playerId wins totalCount winPct
0 1670 483 961 50.260146
1 1748 474 919 51.577802
2 2179 468 837 55.913978
3 4006 434 839 51.728248
4 1668 392 712 55.056180
...
Now, the reason I am posting this here is because I know I'm not taking full advantage of what pandas has to offer. Creating and merging two different dataframes just to find the ratio of player wins seems unnecessary. I feel like I took the "scenic" route on this one.
To anyone more experienced than me, how would you tackle this problem?
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes. Here, the pre-defined sum() method of pandas series is used to compute the sum of all the values of a column.
In [6]: def ratioFunction(): ...: num1 = input('Enter the first number: ') ...: num1 = int(num1) # Now we are good ...: num2 = input('Enter the second number: ') ...: num2 = int(num2) # Good, good ...: ratio12 = int(num1/num2) ...: print('The ratio of', num1, 'and', num2,'is', str(ratio12) + '.
The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
We can take advantage of the way that Boolean values are handled mathematically (True
being 1
and False
being 0
) and use 3 aggregation functions sum
, count
and mean
per group (groupby aggregate
). We can also take advantage of Named Aggregation to both create and rename the columns in one step:
df = (
df.groupby('playerId', as_index=False)
.agg(wins=('winner', 'sum'),
totalCount=('winner', 'count'),
winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100
df
:
playerId wins totalCount winPct
0 1848 1 2 50.0
1 1988 0 2 0.0
2 3543 1 1 100.0
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'playerId': [1848, 1988, 3543, 1848, 1988],
'winner': [True, False, True, False, False]
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With