I'm interested in finding an efficient manner to obtain a summary by group table that would contain:
For example, in case of generating the descriptive statistics I'm using the code below:
data("mtcars")
require(dplyr)
mt_sum <- mtcars %>%
group_by(cyl) %>%
summarise_each(funs(min,max), hp, wt, disp)
which would generate the desired output:
> head(mt_sum)
Source: local data frame [3 x 7]
cyl hp_min wt_min disp_min hp_max wt_max disp_max
(dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 4 52 1.513 71.1 113 3.190 146.7
2 6 105 2.620 145.0 175 3.460 258.0
3 8 150 3.170 275.8 335 5.424 472.0
I'm interested in enriching the data with figure that would reflect count of values per each group. With respect to the count, this can be simply done:
mt_sum2 <- mtcars %>%
group_by(cyl) %>%
summarise(countObs = n())
which would generate the required data:
> head(mt_sum2)
Source: local data frame [3 x 2]
cyl countObs
(dbl) (int)
1 4 11
2 6 7
3 8 14
The problem occurs when I would like to simultaneously apply both transformations.
For example the code:
mt_sum <- mtcars %>%
group_by(cyl) %>%
summarise_each(funs(min,max), hp, wt, disp) %>%
summarise(countObs = n())
would generate:
Source: local data frame [3 x 2]
cyl countObs
(dbl) (int)
1 4 11
2 6 7
3 8 14
without the descriptive statistics that were previously generated.
The code:
mt_sum <- mtcars %>%
group_by(cyl) %>%
summarise_each(funs(min,max,n), hp, wt, disp)
will expectedly fail:
Error: n does not take arguments
The code:
data("mtcars")
require(dplyr)
mt_sum <- mtcars %>%
group_by(cyl) %>%
summarise_each(funs(min,max), hp, wt, disp) %>%
left_join(y = data.frame(
"Var1" = as.numeric(as.character(as.data.frame(table(mtcars$cyl))$Var1)),
"Count" = as.character(as.data.frame(table(mtcars$cyl))$Freq)),
by = c("cyl" = "Var1"))
will deliver the required data:
> head(mt_sum)
Source: local data frame [3 x 8]
cyl hp_min wt_min disp_min hp_max wt_max disp_max Count
(dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (fctr)
1 4 52 1.513 71.1 113 3.190 146.7 11
2 6 105 2.620 145.0 175 3.460 258.0 7
3 8 150 3.170 275.8 335 5.424 472.0 14
I think this is extremely inefficient way of producing this summary. In particular, creating objects on the fly is inefficient when working with big tables. I'm interested in achieving the same results but in a more efficient manner that would not involve creating objects just for the purpose of merging. In particular, what I would like to do in dplyr
would correspond to deriving additional summaries from the previous version of the table. For example:
Here's another (shorter) option using a left_join
:
mtcars %>%
group_by(cyl) %>%
summarise_each(funs(min,max), hp, wt, disp) %>%
left_join(count(mtcars, cyl))
#Joining by: "cyl"
#Source: local data frame [3 x 8]
#
# cyl hp_min wt_min disp_min hp_max wt_max disp_max n
# (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int)
#1 4 52 1.513 71.1 113 3.190 146.7 11
#2 6 105 2.620 145.0 175 3.460 258.0 7
#3 8 150 3.170 275.8 335 5.424 472.0 14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With