Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate the mean of every 13 rows in data frame

Tags:

split

dataframe

r

I have a data frame with 2 columns and 3659 row df

I am trying to reduce the data set by averaging every 10 or 13 rows in this data frame, so I tried the following :

# number of rows per group
n=13
# number of groups
n_grp=nrow(df)/n
round(n_grp,0)
# row indices (one vector per group)
idx_grp <- split(seq(df), rep(seq(n_grp), each = n))

# calculate the col means for all groups
res <- lapply(idx_grp, function(i) {
  # subset of the data frame
  tmp <- dat[i]
  # calculate row means
  colMeans(tmp, na.rm = TRUE)
})
# transform list into a data frame
dat2 <- as.data.frame(res)

However, I can't divide my number of rows by 10 or 13 because data length is not a multiple of split variable. So I am not sure what should do then (I just want may be to calculate the mean of the last group -even with less than 10 elements)

I also tried this one, but the results are the same:

df1=split(df, sample(rep(1:301, 10)))
like image 757
yuliaUU Avatar asked May 20 '15 20:05

yuliaUU


People also ask

How do you find the mean of a row in a Dataframe?

The row average can be found using DataFrame. mean() function. It returns the mean of the values over the requested axis. If axis = 0 , the mean function is applied over the columns.

How do you calculate mean across rows in R?

Calculate the mean of rows of a data frame in R To create a data frame in R, use the data. frame() function. To calculate the mean of rows of the data frame, use the rowMeans() function.


2 Answers

Here's a solution using aggregate() and rep().

df <- data.frame(a=1:12, b=13:24 );
df;
##     a  b
## 1   1 13
## 2   2 14
## 3   3 15
## 4   4 16
## 5   5 17
## 6   6 18
## 7   7 19
## 8   8 20
## 9   9 21
## 10 10 22
## 11 11 23
## 12 12 24
n <- 5;
aggregate(df, list(rep(1:(nrow(df) %/% n + 1), each = n, len = nrow(df))), mean)[-1];
##      a    b
## 1  3.0 15.0
## 2  8.0 20.0
## 3 11.5 23.5

The important part of this solution that handles the issue of non-divisibility of nrow(df) by n is specifying the len parameter (actually the full parameter name is length.out) of rep(), which automatically caps the group vector to the appropriate length.

like image 107
bgoldst Avatar answered Sep 19 '22 15:09

bgoldst


If df is a data.table, you can use %/% to group as in

library(data.table)
setDT(df)
n <- 13 # every 13 rows

df[, mean(z), by= (seq(nrow(df)) - 1) %/% n]

if instead you want every nTH row, use %% instead of %/%

df[, mean(z), by= (seq(nrow(df)) - 1) %% n]
like image 40
Ricardo Saporta Avatar answered Sep 17 '22 15:09

Ricardo Saporta