Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass column names into a function dplyr

Tags:

function

r

dplyr

I'm trying to create a simple summary function to speed up the reporting of multiple columns of data for use in a R Markdown file.

var1 is a categorical column of data, t_var is an integer representing the quarter of data, and dt is the full data.

summarise_data_categorical <- function(var1, t_var, dt){

  print(var1)
  print(t_var)

  #Select the columns to aggregate
  group_func <- dt %>% 
    select(one_of(t_var, var1)) %>%
    group_by(t_var,var1)

  #create simple count summary
  count_table <- group_func %>%
    summarise(count = n()) %>%
    spread(t_var, count)

  #create a frequency version of the same table...
  freq <- dt %>%
    select(t_var, var1) %>%
    group_by(t_var,var1) %>%
    summarise(count = n()) %>%
    mutate(freq = round(count / sum(count),3)*100) %>%
    select(-count)

  #Present that table
  freq_table <- freq %>%
    spread(t_var, freq)

  #Create the chart to do the same thing..
  freq_chart <- freq %>%
    ggplot()+
    geom_line(mapping=aes(x=t_var, y = freq, colour=var1))

  #Compile outputs as a list
  results <- list(count_table, freq_table, freq_chart)

  #Return list
  results

}

Say I've got a frame:

fr <- data.frame(lets = sample(LETTERS, 100, replace=TRUE),
           `quarter type` = sample(1:4, 100, replace=TRUE))

If I run the function, thus:

summarise_data_categorical("lets", "quarter type", fr)

The initial output is promising:

[1] "lets"
[1] "quarter type"

(NOTE: in trying to recreate the data, for some reason I also receive the warning:

Unknown variables: quarter type, Although this doesn't appear in my original data)

The main thing is I get an error:

Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : t_var

Having come from Python, I'm still a bit confused on how to refer to columns. Can someone explain how I can fix what I've got wrong?

like image 809
elksie5000 Avatar asked Apr 16 '17 13:04

elksie5000


Video Answer


1 Answers

We can use the new quosures from the devel version of dplyr (soon to be released in 0.6.0)

summarise_data_categorical <- function(var1, t_var, dt){

  var1 <- enquo(var1)
  t_var <- enquo(t_var)
  v1 <- quo_name(var1)
  v2 <- quo_name(t_var) 

  dt %>%
    select(one_of(v1, v2)) %>%
    group_by(!!t_var, !!var1) %>%
    summarise(count = n()) 

}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]

#   quartertype   lets count
#         <int> <fctr> <int>
#1            1      A     1
#2            1      F     2
#3            1      G     2
#4            1      H     1
#5            1      I     1
#6            1      J     4
#7            1      M     3
#8            1      N     1
#9            1      P     1
#10           1      S     5
# ... with 55 more rows

The enquo does a similar functionality as substitute from base R by taking the input arguments and convert it to quosures. The one_of takes a string argument, so quosures can be converted to string with quo_name. Inside the group_by/summarise/mutate etc, we can evaluate the quosure by unquote (UQ or !!)


The quosures seems to be working fine with dplyr though we have some difficulty in implementing the same with tidyr functions. The following code should work for the full code

 summarise_data_categorical <- function(var1, t_var, dt){

  var1 <- enquo(var1)
  t_var <- enquo(t_var)

  v1 <- quo_name(var1)
  v2 <- quo_name(t_var) 

  Summ_func <- dt %>%
                    select(one_of(v1, v2)) %>%
                  group_by(!!t_var, !!var1) %>%
                    summarise(count = n())

   count_table <- Summ_func %>%
                  spread_(v2, "count") 

   freq <-  Summ_func %>%
                  mutate(freq = round(count / sum(count),3)*100) %>%
              select(-count)

   freq_table <- freq %>%
                    spread_(v2, "freq")

   freq_chart <- freq %>%
             ggplot()+
               geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1)) 

   results <- list(count_table, freq_table, freq_chart)
   results

    }
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
#     lets   `1`   `2`   `3`   `4`
#*  <fctr> <int> <int> <int> <int>
#1       A    NA    NA     1     2
#2       B     2    NA    NA     1
#3       C     1     5     1     2
#4       E     1     1    NA    NA
#5       G    NA     1     2     2
#6       H     1    NA     1     1
#7       I    NA     1     1     2
#8       J     2     1     1     1
#9       K     1     1     2     1
#10      L    NA     2    NA    NA
# ... with 14 more rows

#[[2]]
# A tibble: 24 × 5
#     lets   `1`   `2`   `3`   `4`
#*  <fctr> <dbl> <dbl> <dbl> <dbl>
#1       A    NA    NA   3.1   9.5
#2       B   8.7    NA    NA   4.8
#3       C   4.3  20.8   3.1   9.5
#4       E   4.3   4.2    NA    NA
#5       G    NA   4.2   6.2   9.5
#6       H   4.3    NA   3.1   4.8
#7       I    NA   4.2   3.1   9.5
#8       J   8.7   4.2   3.1   4.8
#9       K   4.3   4.2   6.2   4.8
#10      L    NA   8.3    NA    NA
## ... with 14 more rows

#[[3]]

enter image description here

like image 152
akrun Avatar answered Sep 23 '22 17:09

akrun