Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating data and getting sum and counts [closed]

I have an object in python with a lot of rows:

INPUT :

    Team1     Player1     idTrip13     133
    Team2     Player333   idTrip10     18373
    Team3     Player22    idTrip12     17338899
    Team2     Player293   idTrip02     17656
    Team3     Player20    idTrip11     1883
    Team1     Player1     idTrip19     19393

and I need to aggregate this data (like a pivot table).

OUTPUT I am working on:

Team1   Player1 : 2 trips : sum(133+19393)
Team2   Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3   Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883

Could someone suggest the appropriate object in Python to use such that I could have the following output?

print team, player, trips, time
like image 604
John Doe Avatar asked Feb 09 '23 05:02

John Doe


1 Answers

Use groupby function for pandas DataFrames

  1. Put your data into a list of lists, each inner list will be a row in the dataframe.

    In[1]:
    
    mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
    ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], 
    ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
    
    df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
    
    df
    Out[1]:
         team    player       trips      time
    0   Team1   Player1     idTrip13    133
    1   Team2   Player333   idTrip10    18373
    2   Team3   Player22    idTrip12    17338899
    3   Team2   Player293   idTrip02    17656
    4   Team3   Player20    idTrip11    1883
    5   Team1   Player1     idTrip19    19393
    
  2. Call groupby(), pass the column you wish to use as your grouper, and apply a function to the groups.


Examples

Ex. 1 Find the number of trips each team went on. team is the grouper, and we apply the function count() on column ['trips'].

In[2]:
trip_count = df.groupby(by = ['team'])['trips'].count() 

trip_count              
Out[2]:          

 team
Team1    2
Team2    2
Team3    2
Name: trips, dtype: int64

Ex. 2 (multiple columns): Find the total time each player on a team spent traveling. We use 2 columns ['team', 'player'] as the grouper, and apply the function sum() on column ['time'].

In[3]:              
trip_time = df.groupby(by = ['team', 'player'])['time'].sum() 

trip_time        
Out[3]:

 team   player   
Team1  Player1         19526
Team2  Player293       17656
       Player333       18373
Team3  Player20         1883
       Player22     17338899
Name: time, dtype: int64

Ex. 3 (multiple functions): For each player on a team, find the total number of trips and total time spent traveling.

player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})

player_total
Out[4]:
                 trips  time
team    player      
Team1   Player1     2   19526
Team2   Player293   1   17656
        Player333   1   18373
Team3   Player20    1   1883
        Player22    1   17338899
like image 90
ilyas patanam Avatar answered Feb 12 '23 11:02

ilyas patanam