I'm new to pandas and decided to learn it by playing around with some data I pulled from my favorite game's API. I have a dataframe with two columns "playerId" and "winner" like so:
playerStatus:
______________________
   playerId   winner
0    1848      True
1    1988      False
2    3543      True
3    1848      False
4    1988      False
...
Each row represents a match the player participated in. My goal is to either transform this dataframe or create a new one such that the win percentage for each playerId is calculated. For example, the above dataframe would become:
playerWinsAndTotals
_________________________________________
   playerId   wins  totalPlayed   winPct
0    1848      1        2         50.0000
1    1988      0        2         0.0000
2    3543      1        1         100.0000
...
It took quite a while of reading pandas docs, but I actually managed to achieve this by essentially creating two different tables (one to find the number of wins for each player, one to find the total games for each player), and merging them, then taking the ratio of wins to games played.
Creating the "wins" dataframe:
temp_df = playerStatus[['playerId', 'winner']].value_counts().reset_index(name='wins')
onlyWins = temp_df[temp_df['winner'] == True][['playerId', 'wins']]
onlyWins
_________________________
    playerId    wins
1     1670       483
3     1748       474
4     2179       468
6     4006       434
8     1668       392
...
Creating the "totals" dataframe:
totalPlayed = playerStatus['playerId'].value_counts().reset_index(name='totalCount').rename(columns={'index': 'playerId'})
totalPlayed
____________________
   playerId   totalCount
0    1670        961
1    1748        919
2    1872        877
3    4006        839
4    2179        837
...
Finally, merging them and adding the "winPct" column.
playerWinsAndTotals = onlyWins.merge(totalPlayed, on='playerId', how='left')
playerWinsAndTotals['winPct'] = playerWinsAndTotals['wins']/playerWinsAndTotals['totalCount'] * 100
playerWinsAndTotals
_____________________________________________
   playerId   wins   totalCount     winPct
0    1670      483      961       50.260146
1    1748      474      919       51.577802
2    2179      468      837       55.913978
3    4006      434      839       51.728248
4    1668      392      712       55.056180
...
Now, the reason I am posting this here is because I know I'm not taking full advantage of what pandas has to offer. Creating and merging two different dataframes just to find the ratio of player wins seems unnecessary. I feel like I took the "scenic" route on this one.
To anyone more experienced than me, how would you tackle this problem?
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes. Here, the pre-defined sum() method of pandas series is used to compute the sum of all the values of a column.
In [6]: def ratioFunction(): ...: num1 = input('Enter the first number: ') ...: num1 = int(num1) # Now we are good ...: num2 = input('Enter the second number: ') ...: num2 = int(num2) # Good, good ...: ratio12 = int(num1/num2) ...: print('The ratio of', num1, 'and', num2,'is', str(ratio12) + '.
The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
We can take advantage of the way that Boolean values are handled mathematically (True being 1 and False being 0) and use 3 aggregation functions sum, count and mean per group (groupby aggregate). We can also take advantage of Named Aggregation to both create and rename the columns in one step:
df = (
    df.groupby('playerId', as_index=False)
        .agg(wins=('winner', 'sum'),
             totalCount=('winner', 'count'),
             winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100
df:
   playerId  wins  totalCount  winPct
0      1848     1           2    50.0
1      1988     0           2     0.0
2      3543     1           1   100.0
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
    'playerId': [1848, 1988, 3543, 1848, 1988],
    'winner': [True, False, True, False, False]
})
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With