I have a data frame:
set.seed(123)
x <- sample(10)
y <- x^2
my.df <- data.frame(x, y)
The result is this:
> my.df
x y
1 3 9
2 8 64
3 4 16
4 7 49
5 6 36
6 1 1
7 10 100
8 9 81
9 2 4
10 5 25
What I want is to group the rows by every n rows to compute the mean, sum, or whatever on the 5 selected rows. Something like this for n=5:
my.df %>% group_by(5) %>% summarise(sum = sum(y), mean = mean(y))
The expected output would be something like:
# A tibble: 1 x 2
sum mean
<dbl> <dbl>
1 174 34.8
2 211 42.2
Of course, the number of rows in the data frame could be 15, 20, 100, whatever. I still want to group the data every n rows.
How can I do this?
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group".
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
We can use rep
or gl
to create the grouping variable
library(dplyr)
my.df %>%
group_by(grp = as.integer(gl(n(), 5, n()))) %>%
#or with rep
# group_by(grp = rep(row_number(), length.out = n(), each = 5))
summarise(sum = sum(y), mean = mean(y))
# A tibble: 2 x 3
# grp sum mean
# <int> <dbl> <dbl>
#1 1 174 34.8
#2 2 211 42.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With