I'm using the code below to generate a simple summary table:
# Data
data("mtcars")
# Lib
require(dplyr)
# Summary
mt_sum <- mtcars %>%
group_by(am) %>%
summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
mutate(am = as.character(am)) %>%
left_join(y = as.data.frame(table(mtcars$am),
stringsAsFactors = FALSE),
by = c("am" = "Var1"))
The code produces the desired results:
> head(mt_sum)
Source: local data frame [2 x 10]
am mpg_min cyl_min mpg_mean cyl_mean mpg_median cyl_median mpg_max cyl_max Freq
(chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (int)
1 0 10.4 4 17.14737 6.947368 17.3 8 24.4 8 19
2 1 15.0 4 24.39231 5.076923 22.8 4 33.9 8 13
However, I'm not satisfied with the way the columns are ordered. In particular, I would like to:
Order columns by name
Achieve that via select()
in dplyr
The desired order would look like that:
> names(mt_sum)[order(names(mt_sum))]
[1] "am" "cyl_max" "cyl_mean" "cyl_median" "cyl_min" "Freq" "mpg_max"
[8] "mpg_mean" "mpg_median" "mpg_min"
Ideally, I would like to pass names(mt_sum)[order(names(mt_sum))]
way of sorting the columns in select()
. But the code:
mt_sum <- mtcars %>%
group_by(am) %>%
summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
mutate(am = as.character(am)) %>%
left_join(y = as.data.frame(table(mtcars$am),
stringsAsFactors = FALSE),
by = c("am" = "Var1")) %>%
select(names(.)[order(names(.))])
Will return the expected error:
Error: All select() inputs must resolve to integer column positions. The following do not: * names(.)[order(names(.))]
In my real data I'm generating a vast number of summary columns. Hence my question, how can I dynamically pass sorted column names to select()
in dplyr
so it will understand it and apply to the data.frame
at Hand?
My focus is on figuring out a way of passing the dynamically generated column names to select()
. I know that I could sort the columns in base
or by typing names, as discussed here.
All you need is just:
mt_sum %>% select(order(names(.)))
#Source: local data frame [2 x 10]
#
# am cyl_max cyl_mean cyl_median cyl_min Freq mpg_max mpg_mean mpg_median mpg_min
# (chr) (dbl) (dbl) (dbl) (dbl) (int) (dbl) (dbl) (dbl) (dbl)
#1 0 8 6.947368 8 4 19 24.4 17.14737 17.3 10.4
#2 1 8 5.076923 4 4 13 33.9 24.39231 22.8 15.0
It works, because order
returns integer column positions, as required by select
.
You're definitely on the right path.
mt_sum <- mtcars %>%
group_by(am) %>%
summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
mutate(am = as.character(am)) %>%
left_join(y = as.data.frame(table(mtcars$am),
stringsAsFactors = FALSE),
by = c("am" = "Var1")) %>%
.[, names(.)[order(names(.))]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With