I have an object in python with a lot of rows:
INPUT :
Team1 Player1 idTrip13 133
Team2 Player333 idTrip10 18373
Team3 Player22 idTrip12 17338899
Team2 Player293 idTrip02 17656
Team3 Player20 idTrip11 1883
Team1 Player1 idTrip19 19393
and I need to aggregate this data (like a pivot table).
OUTPUT I am working on:
Team1 Player1 : 2 trips : sum(133+19393)
Team2 Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3 Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883
Could someone suggest the appropriate object in Python to use such that I could have the following output?
print team, player, trips, time
Use groupby function for pandas DataFrames
Put your data into a list of lists, each inner list will be a row in the dataframe.
In[1]:
mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656],
['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
df
Out[1]:
team player trips time
0 Team1 Player1 idTrip13 133
1 Team2 Player333 idTrip10 18373
2 Team3 Player22 idTrip12 17338899
3 Team2 Player293 idTrip02 17656
4 Team3 Player20 idTrip11 1883
5 Team1 Player1 idTrip19 19393
Call groupby()
, pass the column you wish to use as your grouper,
and apply a function to the groups.
Examples
Ex. 1 Find the number of trips each team went on. team
is the grouper, and we apply the function count()
on column ['trips']
.
In[2]:
trip_count = df.groupby(by = ['team'])['trips'].count()
trip_count
Out[2]:
team
Team1 2
Team2 2
Team3 2
Name: trips, dtype: int64
Ex. 2 (multiple columns): Find the total time each player on a team spent traveling. We use 2 columns ['team', 'player']
as the grouper, and apply the function sum()
on column ['time']
.
In[3]:
trip_time = df.groupby(by = ['team', 'player'])['time'].sum()
trip_time
Out[3]:
team player
Team1 Player1 19526
Team2 Player293 17656
Player333 18373
Team3 Player20 1883
Player22 17338899
Name: time, dtype: int64
Ex. 3 (multiple functions): For each player on a team, find the total number of trips and total time spent traveling.
player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})
player_total
Out[4]:
trips time
team player
Team1 Player1 2 19526
Team2 Player293 1 17656
Player333 1 18373
Team3 Player20 1 1883
Player22 1 17338899
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With