Create new rows based on max rows of a column

Question

So I'm trying to create new data in a time series based on past data. For example I have player data here with each row being stats accumulated at a certain age. I want to create new row in the Dataframe where I increment the max age by one and then take the average of the sa and ga column from the two years before that.

Here is the data

import pandas as pd

data = [['Adam Wilcox', 8476330, 25, 14.0, 0.0],
        ['Adin Hill', 8478499, 21, 129.0, 14.0],
        ['Adin Hill', 8478499, 22, 322.0, 32.0],
        ['Adin Hill', 8478499, 23, 343.0, 28.0],
        ['Adin Hill', 8478499, 24, 530.0, 46.0],
        ['Adin Hill', 8478499, 25, 237.0, 26.0],
        ['Al Montoya', 8471219, 24, 120.0, 9.0],
        ['Al Montoya', 8471219, 26, 585.0, 46.0],
        ['Al Montoya', 8471219, 27, 832.0, 89.0],
        ['Al Montoya', 8471219, 28, 168.0, 17.0]]

model_df = pd.DataFrame(data, 
                         columns=['player', 'player_id', 'season_age', 'sa', 'ga'])

For example what I would want to create is ['Al Montoya', 8471219, 29, 500, 53] (remember the last two values is the average of the sa and ga columns from ages 28 and 27).

I've accomplished this using iterrows and creating a new Dataframe and appending like this:

max_ages = model_df.groupby(['player', 'player_id'])[['season_age']].max().reset_index()
added_ages = []
for player in max_ages.iterrows():

    row = [player[1][0],
           player[1][1],
           player[1][2] + 1, 
           (model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 1)]['sa'].sum() +
           model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 2)]['sa'].sum())/2,
           (model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 1)]['ga'].sum() +
           model_df[(model_df['player_id'] == player[1][1]) &
                    (model_df['season_age'] == player[1][2] - 2)]['ga'].sum())/2
          ]
    added_ages.append(row)

added_ages_df = pd.DataFrame(added_ages, 
                             columns=['player', 'player_id', 'season_age', 'sa', 'ga'])
model_df = pd.concat([model_df, added_ages_df])

Obviously this is an adhoc solution that is very brittle, my question is if there is a built in way in pandas of doing this without using iterrows

The expected Dataframe would look like this easier to represent in list form

data = [['Adam Wilcox', 8476330, 25, 14.0, 0.0],
        ['Adin Hill', 8478499, 21, 129.0, 14.0],
        ['Adin Hill', 8478499, 22, 322.0, 32.0],
        ['Adin Hill', 8478499, 23, 343.0, 28.0],
        ['Adin Hill', 8478499, 24, 530.0, 46.0],
        ['Adin Hill', 8478499, 25, 237.0, 26.0],
        ['Adin Hill', 8478499, 26, 502, 36],
        ['Al Montoya', 8471219, 24, 120.0, 9.0],
        ['Al Montoya', 8471219, 26, 585.0, 46.0],
        ['Al Montoya', 8471219, 27, 832.0, 89.0],
        ['Al Montoya', 8471219, 28, 168.0, 17.0],
        ['Al Montoya', 8471219, 29, 500, 53]]

Derek O · Accepted Answer

You can define a function called add_row and pass it to a groupby. I'll assume that if there aren't two years of data for a player, you'll want the columns sa and ga to be populated with NaN:

def add_row(x):
    last_row = x.iloc[-1]
    last_row['season_age'] = last_row['season_age']+1
    if len(x) < 2:
        last_row['sa'], last_row['ga'] = float("nan"), float("nan")
        return x.append(last_row)
    else:
        last_row['sa'], last_row['ga'] = x[['sa','ga']].iloc[-2:].mean()
        return x.append(last_row)

new_model_df = model_df.groupby("player").apply(add_row).reset_index(drop=True)

Output:

>>> new_model_df
         player  player_id  season_age     sa    ga
0   Adam Wilcox    8476330          25   14.0   0.0
1   Adam Wilcox    8476330          26    NaN   NaN
2     Adin Hill    8478499          21  129.0  14.0
3     Adin Hill    8478499          22  322.0  32.0
4     Adin Hill    8478499          23  343.0  28.0
5     Adin Hill    8478499          24  530.0  46.0
6     Adin Hill    8478499          25  237.0  26.0
7     Adin Hill    8478499          26  383.5  36.0
8    Al Montoya    8471219          24  120.0   9.0
9    Al Montoya    8471219          26  585.0  46.0
10   Al Montoya    8471219          27  832.0  89.0
11   Al Montoya    8471219          28  168.0  17.0
12   Al Montoya    8471219          29  500.0  53.0

sammywemmy · Answer

Do some computation on the grouped object, and merge the result to the model_df:

grouper = ['player', 'player_id']
grouped = model_df.groupby(grouper, sort = False)
tail = grouped.nth(-1) # get the last row per group
tail = tail.assign(season_age = tail.season_age + 1)
# get the average of the last two columns with rolling
# a second groupby is called here to get single rows per group
sa_ga = (group[['sa', 'ga']]
         .rolling(2)
         .mean()
         .groupby(grouper)
         .nth(-1)
         )

tail = tail.assign(**sa_ga).reset_index()

# final output
(pd.concat([model_df, tail])
   .sort_values(grouper, ignore_index = True)
)

         player  player_id  season_age     sa    ga
0   Adam Wilcox    8476330          25   14.0   0.0
1   Adam Wilcox    8476330          26    NaN   NaN
2     Adin Hill    8478499          21  129.0  14.0
3     Adin Hill    8478499          22  322.0  32.0
4     Adin Hill    8478499          23  343.0  28.0
5     Adin Hill    8478499          24  530.0  46.0
6     Adin Hill    8478499          25  237.0  26.0
7     Adin Hill    8478499          26  383.5  36.0
8    Al Montoya    8471219          24  120.0   9.0
9    Al Montoya    8471219          26  585.0  46.0
10   Al Montoya    8471219          27  832.0  89.0
11   Al Montoya    8471219          28  168.0  17.0
12   Al Montoya    8471219          29  500.0  53.0

Create new rows based on max rows of a column

Tags:

python

pandas

dataframe

Matthew Barlowe

2 Answers

Derek O

sammywemmy

Recent Activity

Donate For Us

Create new rows based on max rows of a column

Tags:

python

pandas

dataframe

Matthew Barlowe

2 Answers

Derek O

sammywemmy

Related questions

Recent Activity

Donate For Us