Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group by a fixed number of rows in dplyr? [duplicate]

Tags:

r

dplyr

I have a data frame:

set.seed(123)
x <- sample(10)
y <- x^2
my.df <- data.frame(x, y)

The result is this:

> my.df
    x   y
1   3   9
2   8  64
3   4  16
4   7  49
5   6  36
6   1   1
7  10 100
8   9  81
9   2   4
10  5  25

What I want is to group the rows by every n rows to compute the mean, sum, or whatever on the 5 selected rows. Something like this for n=5:

my.df %>% group_by(5) %>% summarise(sum = sum(y), mean = mean(y))

The expected output would be something like:

# A tibble: 1 x 2
     sum   mean
   <dbl>  <dbl>
1    174   34.8
2    211   42.2

Of course, the number of rows in the data frame could be 15, 20, 100, whatever. I still want to group the data every n rows.

How can I do this?

like image 414
Ben Avatar asked Mar 03 '19 11:03

Ben


People also ask

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What does group by do in Dplyr?

group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group".

Does Dplyr include Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.


1 Answers

We can use rep or gl to create the grouping variable

library(dplyr)
my.df %>% 
    group_by(grp = as.integer(gl(n(), 5, n()))) %>% 
    #or with rep
    # group_by(grp = rep(row_number(), length.out = n(), each = 5)) 
    summarise(sum = sum(y), mean = mean(y))
# A tibble: 2 x 3
#    grp   sum  mean
#  <int> <dbl> <dbl>
#1     1   174  34.8
#2     2   211  42.2
like image 90
akrun Avatar answered Nov 15 '22 00:11

akrun