I need to get the mean of one column (here: score) for specific rows (here: years). Specifically, I would like to know the average score for three periods:
This is the structure of my data:
country year score
Algeria 1980 -1.1201501
Algeria 1981 -1.0526943
Algeria 1982 -1.0561565
Algeria 1983 -1.1274560
Algeria 1984 -1.1353926
Algeria 1985 -1.1734330
Algeria 1986 -1.1327666
Algeria 1987 -1.1263586
Algeria 1988 -0.8529455
Algeria 1989 -0.2930265
Algeria 1990 -0.1564207
Algeria 1991 -0.1526328
Algeria 1992 -0.9757842
Algeria 1993 -0.9714060
Algeria 1994 -1.1422258
Algeria 1995 -0.3675797
...
The calculated mean values should be added to the df in an additional column ("mean"), i.e. same mean value for years of period 1, for those of period 2 etc.
This is how it should look like:
country year score mean
Algeria 1980 -1.1201501 -1.089
Algeria 1981 -1.0526943 -1.089
Algeria 1982 -1.0561565 -1.089
Algeria 1983 -1.1274560 -1.089
Algeria 1984 -1.1353926 -0.839
Algeria 1985 -1.1734330 -0.839
Algeria 1986 -1.1327666 -0.839
Algeria 1987 -1.1263586 -0.839
Algeria 1988 -0.8529455 -0.839
Algeria 1989 -0.2930265 -0.839
Algeria 1990 -0.1564207 -0.839
...
Every possible path I tried got easily super complicated - and I have to calculate the mean scores for different periods of time for over 90 countries ...
Many many thanks for your help!
The rowMeans() function in R can be used to calculate the mean of several rows of a matrix or data frame in R.
The DataFrame.mean() method is used to return the mean of the values for the requested axis. If you apply this method on a series object, then it returns a scalar value, which is the mean value of all the observations in the pandas DataFrame.
The row average can be found using DataFrame. mean() function. It returns the mean of the values over the requested axis. If axis = 0 , the mean function is applied over the columns.
ColMeans() Function along with sapply() is used to get the mean of the multiple column. Dataframe is passed as an argument to ColMeans() Function. Mean of numeric columns of the dataframe is calculated.
datfrm$mean <-
with (datfrm, ave( score, findInterval(year, c(-Inf, 1984, 1991, Inf)), FUN= mean) )
The title question is a bit different than the real question and would be answered by using logical indexing. If one wanted only the mean for a particular subset say year >= 1984 & year <= 1990
it would be done via:
mn84_90 <- with(datfrm, mean(score[year >= 1984 & year <= 1990]) )
Since [proved wrong, thanks @DWin]. For completeness the findInterval
requires year
to be sorted (as it is in your example) I'd be tempted to use cut
in case it isn't sorteddata.table
equivalent (scales for large data) is :
require(data.table)
DT = as.data.table(DF) # or just start with a data.table in the first place
DT[, mean:=mean(score), by=cut(year,c(-Inf,1984,1991,Inf))]
or findInterval
is likely faster as DWin used :
DT[, mean:=mean(score), by=findInterval(year,c(-Inf,1984,1991,Inf))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With