I have a data frame with 2 columns and 3659 row <code>df</code> I am trying to reduce the data set by averaging every 10 or 13 rows in this data frame, so I tried the following : <pre class="prettyprint"><code># number of rows per group n=13 # number of groups n_grp=nrow(df)/n round(n_grp,0) # row indices (one vector per group) idx_grp <- split(seq(df), rep(seq(n_grp), each = n)) # calculate the col means for all groups res <- lapply(idx_grp, function(i) { # subset of the data frame tmp <- dat[i] # calculate row means colMeans(tmp, na.rm = TRUE) }) # transform list into a data frame dat2 <- as.data.frame(res) </code></pre> However, I can't divide my number of rows by 10 or 13 because data length is not a multiple of split variable. So I am not sure what should do then (I just want may be to calculate the mean of the last group -even with less than 10 elements) I also tried this one, but the results are the same: <pre class="prettyprint"><code>df1=split(df, sample(rep(1:301, 10))) </code></pre>

Here's a solution using <code>aggregate()</code> and <code>rep()</code>. <pre class="prettyprint"><code>df <- data.frame(a=1:12, b=13:24 ); df; ## a b ## 1 1 13 ## 2 2 14 ## 3 3 15 ## 4 4 16 ## 5 5 17 ## 6 6 18 ## 7 7 19 ## 8 8 20 ## 9 9 21 ## 10 10 22 ## 11 11 23 ## 12 12 24 n <- 5; aggregate(df, list(rep(1:(nrow(df) %/% n + 1), each = n, len = nrow(df))), mean)[-1]; ## a b ## 1 3.0 15.0 ## 2 8.0 20.0 ## 3 11.5 23.5 </code></pre> The important part of this solution that handles the issue of non-divisibility of <code>nrow(df)</code> by <code>n</code> is specifying the <code>len</code> parameter (actually the full parameter name is <code>length.out</code>) of <code>rep()</code>, which automatically caps the group vector to the appropriate length.

If <code>df</code> is a data.table, you can use <code>%/%</code> to group as in <pre class="prettyprint"><code>library(data.table) setDT(df) n <- 13 # every 13 rows </code></pre> <hr> <pre class="prettyprint"><code>df[, mean(z), by= (seq(nrow(df)) - 1) %/% n] </code></pre> if instead you want every nTH row, use <code>%%</code> instead of <code>%/%</code> <pre class="prettyprint"><code>df[, mean(z), by= (seq(nrow(df)) - 1) %% n] </code></pre>

Calculate the mean of every 13 rows in data frame

Tags:

split

dataframe

r

I have a data frame with 2 columns and 3659 row df

I am trying to reduce the data set by averaging every 10 or 13 rows in this data frame, so I tried the following :

# number of rows per group
n=13
# number of groups
n_grp=nrow(df)/n
round(n_grp,0)
# row indices (one vector per group)
idx_grp <- split(seq(df), rep(seq(n_grp), each = n))

# calculate the col means for all groups
res <- lapply(idx_grp, function(i) {
  # subset of the data frame
  tmp <- dat[i]
  # calculate row means
  colMeans(tmp, na.rm = TRUE)
})
# transform list into a data frame
dat2 <- as.data.frame(res)

However, I can't divide my number of rows by 10 or 13 because data length is not a multiple of split variable. So I am not sure what should do then (I just want may be to calculate the mean of the last group -even with less than 10 elements)

I also tried this one, but the results are the same:

df1=split(df, sample(rep(1:301, 10)))

757

asked May 20 '15 20:05

yuliaUU

2 Answers

Here's a solution using aggregate() and rep().

df <- data.frame(a=1:12, b=13:24 );
df;
##     a  b
## 1   1 13
## 2   2 14
## 3   3 15
## 4   4 16
## 5   5 17
## 6   6 18
## 7   7 19
## 8   8 20
## 9   9 21
## 10 10 22
## 11 11 23
## 12 12 24
n <- 5;
aggregate(df, list(rep(1:(nrow(df) %/% n + 1), each = n, len = nrow(df))), mean)[-1];
##      a    b
## 1  3.0 15.0
## 2  8.0 20.0
## 3 11.5 23.5

The important part of this solution that handles the issue of non-divisibility of nrow(df) by n is specifying the len parameter (actually the full parameter name is length.out) of rep(), which automatically caps the group vector to the appropriate length.

107

answered Sep 19 '22 15:09

bgoldst

If df is a data.table, you can use %/% to group as in

library(data.table)
setDT(df)
n <- 13 # every 13 rows

df[, mean(z), by= (seq(nrow(df)) - 1) %/% n]

if instead you want every nTH row, use %% instead of %/%

df[, mean(z), by= (seq(nrow(df)) - 1) %% n]

answered Sep 17 '22 15:09

Ricardo Saporta

Related questions
                            
                                R vectorized array data manipulation
                            
                                change color for two geom_point() in ggplot2
                            
                                How to use a string variable to select a data frame column using $ notation [duplicate]
                            
                                Create a straight faint dotted/dashed line through y=0
                            
                                R: Replacing NA values by mean of hour with dplyr
                            
                                duplicate rows in a data frame in R
                            
                                Re-ordering bars in R's barplot()
                            
                                Export R data.frame to SPSS
                            
                                R data.table: mean for many columns
                            
                                Using purrr::pmap within mutate to create list-column
                            
                                Random sample of rows from subset of an R dataframe [duplicate]
                            
                                "Error in continuous_scale" and "error in inherits" ggplot2 & R 2.14.2
                            
                                Apply a function to groups within a data.frame in R
                            
                                How to rotate the x-axis labels 90 degrees in levelplot
                            
                                R igraph convert parallel edges to weight attribute
                            
                                How to preProcess features when some of them are factors?
                            
                                Moving average of previous three values in R
                            
                                Invalid .internal.selfref in data.table
                            
                                About GForce in data.table 1.9.2
                            
                                How to display a busy indicator in a shiny app?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With