dplyr is amazingly fast, but I wonder if I'm missing something: is it possible summarise over several variables. For example: <pre class="prettyprint"><code>library(dplyr) library(reshape2) (df=dput(structure(list(sex = structure(c(1L, 1L, 2L, 2L), .Label = c("boy", "girl"), class = "factor"), age = c(52L, 58L, 40L, 62L), bmi = c(25L, 23L, 30L, 26L), chol = c(187L, 220L, 190L, 204L)), .Names = c("sex", "age", "bmi", "chol"), row.names = c(NA, -4L), class = "data.frame"))) sex age bmi chol 1 boy 52 25 187 2 boy 58 23 220 3 girl 40 30 190 4 girl 62 26 204 dg=group_by(df,sex) </code></pre> With this small dataframe, it's easy to write <pre class="prettyprint"><code>summarise(dg,mean(age),mean(bmi),mean(chol)) </code></pre> And I know that to get what I want, I could melt, get the means, and then dcast such as <pre class="prettyprint"><code>dm=melt(df, id.var='sex') dmg=group_by(dm, sex, variable); x=summarise(dmg, means=mean(value)) dcast(x, sex~variable) </code></pre> But what if I have >20 variables and a very large number of rows. Is there anything similar to .SD in data.table that would allow me to take the means of all variables in the grouped data frame? Or, is it possible to somehow use lapply on the grouped data frame? Thanks for any help

As has been mentioned by several folks, <code>mutate_each()</code> and <code>summarise_each()</code> are deprecated in favour of the new <code>across()</code> function. Answer as of <code>dplyr</code> version 1.0.5: <pre class="prettyprint"><code>df %>% group_by(sex) %>% summarise(across(everything(), mean)) </code></pre> Original answer: <code>dplyr</code> now has <code>summarise_each</code>: <pre class="prettyprint"><code>df %>% group_by(sex) %>% summarise_each(funs(mean)) </code></pre>

The <code>data.table</code> idiom is <code>lapply(.SD, mean)</code>, which is <pre class="prettyprint"><code>DT <- data.table(df) DT[, lapply(.SD, mean), by = sex] # sex age bmi chol # 1: boy 55 24 203.5 # 2: girl 51 28 197.0 </code></pre> I'm not sure of a <code>dplyr</code> idiom for the same thing, but you can do something like <pre class="prettyprint"><code>dg <- group_by(df, sex) # the names of the columns you want to summarize cols <- names(dg)[-1] # the dots component of your call to summarise dots <- sapply(cols ,function(x) substitute(mean(x), list(x=as.name(x)))) do.call(summarise, c(list(.data=dg), dots)) # Source: local data frame [2 x 4] # sex age bmi chol # 1 boy 55 24 203.5 # 2 girl 51 28 197.0 </code></pre> Note that there is a github issue #178 to efficienctly implement the <code>plyr</code> idiom <code>colwise</code> in <code>dplyr</code>.

Can dplyr summarise over several variables without listing each one? [duplicate]

Tags:

r

dplyr

dplyr is amazingly fast, but I wonder if I'm missing something: is it possible summarise over several variables. For example:

library(dplyr) library(reshape2)  (df=dput(structure(list(sex = structure(c(1L, 1L, 2L, 2L), .Label = c("boy",  "girl"), class = "factor"), age = c(52L, 58L, 40L, 62L), bmi = c(25L,  23L, 30L, 26L), chol = c(187L, 220L, 190L, 204L)), .Names = c("sex",  "age", "bmi", "chol"), row.names = c(NA, -4L), class = "data.frame")))     sex age bmi chol 1  boy  52  25  187 2  boy  58  23  220 3 girl  40  30  190 4 girl  62  26  204  dg=group_by(df,sex)

With this small dataframe, it's easy to write

summarise(dg,mean(age),mean(bmi),mean(chol))

And I know that to get what I want, I could melt, get the means, and then dcast such as

dm=melt(df, id.var='sex') dmg=group_by(dm, sex, variable);  x=summarise(dmg, means=mean(value)) dcast(x, sex~variable)

But what if I have >20 variables and a very large number of rows. Is there anything similar to .SD in data.table that would allow me to take the means of all variables in the grouped data frame? Or, is it possible to somehow use lapply on the grouped data frame?

Thanks for any help

652

asked Jan 22 '14 23:01

David F

2 Answers

As has been mentioned by several folks, mutate_each() and summarise_each() are deprecated in favour of the new across() function.

Answer as of dplyr version 1.0.5:

df %>%   group_by(sex) %>%   summarise(across(everything(), mean))

Original answer:

dplyr now has summarise_each:

df %>%    group_by(sex) %>%    summarise_each(funs(mean))

148

answered Oct 19 '22 23:10

rrs

The data.table idiom is lapply(.SD, mean), which is

DT <- data.table(df) DT[, lapply(.SD, mean), by = sex] #     sex age bmi  chol # 1:  boy  55  24 203.5 # 2: girl  51  28 197.0

I'm not sure of a dplyr idiom for the same thing, but you can do something like

dg <- group_by(df, sex) # the names of the columns you want to summarize cols <- names(dg)[-1] # the dots component of your call to summarise dots <- sapply(cols ,function(x) substitute(mean(x), list(x=as.name(x)))) do.call(summarise, c(list(.data=dg), dots)) # Source: local data frame [2 x 4]  #    sex age bmi  chol # 1  boy  55  24 203.5 # 2 girl  51  28 197.0

Note that there is a github issue #178 to efficienctly implement the plyr idiom colwise in dplyr.

answered Oct 20 '22 01:10

mnel

Related questions
                            
                                Add multiple columns to R data.table in one function call?
                            
                                How to leave the R browser() mode in the console window?
                            
                                R: 2 functions with the same name in 2 different packages
                            
                                How can I print when using %dopar%
                            
                                How to declare a vector of zeros in R
                            
                                Merge two data frames while keeping the original row order
                            
                                Understanding `scale` in R
                            
                                Using ggplot2, can I insert a break in the axis?
                            
                                Round up from .5
                            
                                Multiply rows of matrix by vector?
                            
                                Keeping trailing zeros
                            
                                Append data frames together in a for loop
                            
                                R: losing column names when adding rows to an empty data frame
                            
                                How to tell CRAN to install package dependencies automatically?
                            
                                How to group data.table by multiple columns?
                            
                                Proxy setting for R
                            
                                Error: package or namespace load failed for ggplot2 and for data.table
                            
                                Get dplyr count of distinct in a readable way
                            
                                How to use random forests in R with missing values?
                            
                                Create a Vector of All Days Between Two Dates

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With