The title may not be that clear, since it was difficult to summarize the problem in a few words, although I don't think the problem is that difficult to solve. To explain the problem, let me share a dataframe for reference:
head(df, n = 10)
team score
1 A 10
2 A 4
3 A 10
4 A 16
5 A 20
6 B 5
7 B 11
8 B 8
9 B 16
10 B 5
I'd like to add a third column, that calculates the average score for each team, with the average score updating as I go down the rows for each team, and then resetting at a new team. For example, the output column I am hoping for would look like this:
head(df, n = 10)
team score avg_score
1 A 10 10
2 A 4 7
3 A 10 8
4 A 16 10
5 A 20 12
6 B 5 5
7 B 11 8
8 B 8 8
9 B 16 10
10 B 5 9
# row1: 10 = 10
# row2: 7 = (10 + 4)/2
# row3: 8 = (10 + 4 + 10)/3
# ...
with the pattern following, and the calculation restarting for a new team.
Thanks,
To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.
rowMeans() function in R Language is used to find out the mean of each row of a data frame, matrix, or array.
The row average can be found using DataFrame. mean() function. It returns the mean of the values over the requested axis.
In this method of computing, the mean of the given dataframe column user just need to call the colMeans function which is an in-build function in R language and pass the dataframe as its parameter, then this will be returning the mean of all the column in the provided dataframe to the user.
library("data.table")
setDT(df)[, `:=` (avg_score = cumsum(score)/1:.N), by = team]
or more readable as per the comment by @snoram
setDT(dt)[, avg_score := cumsum(score)/(1:.N), by = team]
# team score avg_score
# 1: A 10 10
# 2: A 4 7
# 3: A 10 8
# 4: A 16 10
# 5: A 20 12
# 6: B 5 5
# 7: B 11 8
# 8: B 8 8
# 9: B 16 10
# 10: B 5 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With