Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert string to symbol accepted by dplyr in function

Tags:

r

dplyr

My data frame looks like:

> str(b)
'data.frame':   2720 obs. of  3 variables:
 $ Hospital.Name: chr  "SOUTHEAST ALABAMA MEDICAL CENTER" "MARSHALL MEDICAL CENTER SOUTH" "ELIZA COFFEE MEMORIAL HOSPITAL" "ST VINCENT'S EAST" ...
 $ State        : chr  "AL" "AL" "AL" "AL" ...
 $ heart attack : num  14.3 18.5 18.1 17.7 18 15.9 19.6 17.3 17.8 17.5 ...

I want to group it by State, sort them by State and Heart Attack, and then add a column that return row number within each group. The ideal result would look like:

# A tibble: 2,720 x 4
# Groups:   State [54]
                      Hospital.Name State `heart attack`  rank
                              <chr> <chr>          <dbl> <int>
 1 PROVIDENCE ALASKA MEDICAL CENTER    AK           13.4     1
 2         ALASKA REGIONAL HOSPITAL    AK           14.5     2
 3      FAIRBANKS MEMORIAL HOSPITAL    AK           15.5     3
 4     ALASKA NATIVE MEDICAL CENTER    AK           15.7     4
 5   MAT-SU REGIONAL MEDICAL CENTER    AK           17.7     5
 6         CRESTWOOD MEDICAL CENTER    AL           13.3     1
 7      BAPTIST MEDICAL CENTER EAST    AL           14.2     2
 8 SOUTHEAST ALABAMA MEDICAL CENTER    AL           14.3     3
 9               GEORGIANA HOSPITAL    AL           14.5     4
10      PRATTVILLE BAPTIST HOSPITAL    AL           14.6     5
# ... with 2,710 more rows

so my code is:

             outcome<-"heart attack"
            c<-arrange(b,State,sym(outcome))%>%
                    group_by(State)%>%
            mutate(rank=row_number(sym(outcome)))

but I got this error:

Error in arrange_impl(.data, dots) : object 'heart attack' not found

When I ran sym(outcome) independently and copied the results into my code, it works:

sym(outcome)
`heart attack`
c<-arrange(b,State,`heart attack`)%>%
+                         group_by(State)%>%
+                 mutate(rank=rank(`heart attack`))
> c
# A tibble: 2,720 x 4
# Groups:   State [54]
                      Hospital.Name State `heart attack`  rank
                              <chr> <chr>          <chr> <dbl>
 1 PROVIDENCE ALASKA MEDICAL CENTER    AK           13.4     1
 2         ALASKA REGIONAL HOSPITAL    AK           14.5     2
 3      FAIRBANKS MEMORIAL HOSPITAL    AK           15.5     3
 4     ALASKA NATIVE MEDICAL CENTER    AK           15.7     4
 5   MAT-SU REGIONAL MEDICAL CENTER    AK           17.7     5
 6         CRESTWOOD MEDICAL CENTER    AL           13.3     1
 7      BAPTIST MEDICAL CENTER EAST    AL           14.2     2
 8 SOUTHEAST ALABAMA MEDICAL CENTER    AL           14.3     3
 9               GEORGIANA HOSPITAL    AL           14.5     4
10      PRATTVILLE BAPTIST HOSPITAL    AL           14.6     5
# ... with 2,710 more rows

This is a part of a function, so the 'outcome' needs to be a string. Therefore I tried to convert a string to a symbol so that I can reference the column in dplyr. can anyone tell me what's happening here? are there any good ways to achieve my goal?

like image 274
Yin Avatar asked Oct 21 '17 16:10

Yin


2 Answers

You need to unquote the symbol with !!:

arrange(b, State, !!sym(outcome))

Or UQ:

arrange(b, State, UQ(sym(outcome)))

Similarly for mutate:

mutate(rank=row_number(!!sym(outcome)))   # or mutate(rank=row_number(UQ(sym(outcome))))
like image 179
Psidom Avatar answered Oct 05 '22 07:10

Psidom


If you are only trying to name the column then you will want to use the backtick (`). (It is typically paired with the ~ on the top left of your keyboard just below the ESC key.) Please note that is not the same as the single quotation mark (').

The reason you often will get your variable written like this is from importing header names containing spaces into tibbles. Any header name that has a space in it gets wrapped in `. You need to refer to those columns by also wrapping them in backticks or else R does not recognize you are referring the objects in memory that it can work with. It will just think you are referring to the string and not the object in memory. Though it will happily store the object with a space in its name if you use " or '.

see below demonstration of the issue:

`tidy time` <- 4
'tidy time' <- 5
"tidy time" <- 6
print('tidy time')
print("tidy time")
print(`tidy time`)

This is the cause for R's error message.

Hopefully understanding all that will spare you from having to call on the sym function. In any case, if you remove the space in the name the problem will also go away and you can save the backticks for another day.

To learn more about !! and unquoting variables (which psidom was referring to in his answer), and also learn about the related issues that occur in writing functions that rely on referencing objects with non-standard evaluation in dplyr please see here: https://rpubs.com/hadley/dplyr-programming

like image 38
leerssej Avatar answered Oct 05 '22 05:10

leerssej