My data frame looks like:
> str(b)
'data.frame': 2720 obs. of 3 variables:
$ Hospital.Name: chr "SOUTHEAST ALABAMA MEDICAL CENTER" "MARSHALL MEDICAL CENTER SOUTH" "ELIZA COFFEE MEMORIAL HOSPITAL" "ST VINCENT'S EAST" ...
$ State : chr "AL" "AL" "AL" "AL" ...
$ heart attack : num 14.3 18.5 18.1 17.7 18 15.9 19.6 17.3 17.8 17.5 ...
I want to group it by State, sort them by State and Heart Attack, and then add a column that return row number within each group. The ideal result would look like:
# A tibble: 2,720 x 4
# Groups: State [54]
Hospital.Name State `heart attack` rank
<chr> <chr> <dbl> <int>
1 PROVIDENCE ALASKA MEDICAL CENTER AK 13.4 1
2 ALASKA REGIONAL HOSPITAL AK 14.5 2
3 FAIRBANKS MEMORIAL HOSPITAL AK 15.5 3
4 ALASKA NATIVE MEDICAL CENTER AK 15.7 4
5 MAT-SU REGIONAL MEDICAL CENTER AK 17.7 5
6 CRESTWOOD MEDICAL CENTER AL 13.3 1
7 BAPTIST MEDICAL CENTER EAST AL 14.2 2
8 SOUTHEAST ALABAMA MEDICAL CENTER AL 14.3 3
9 GEORGIANA HOSPITAL AL 14.5 4
10 PRATTVILLE BAPTIST HOSPITAL AL 14.6 5
# ... with 2,710 more rows
so my code is:
outcome<-"heart attack"
c<-arrange(b,State,sym(outcome))%>%
group_by(State)%>%
mutate(rank=row_number(sym(outcome)))
but I got this error:
Error in arrange_impl(.data, dots) : object 'heart attack' not found
When I ran sym(outcome) independently and copied the results into my code, it works:
sym(outcome)
`heart attack`
c<-arrange(b,State,`heart attack`)%>%
+ group_by(State)%>%
+ mutate(rank=rank(`heart attack`))
> c
# A tibble: 2,720 x 4
# Groups: State [54]
Hospital.Name State `heart attack` rank
<chr> <chr> <chr> <dbl>
1 PROVIDENCE ALASKA MEDICAL CENTER AK 13.4 1
2 ALASKA REGIONAL HOSPITAL AK 14.5 2
3 FAIRBANKS MEMORIAL HOSPITAL AK 15.5 3
4 ALASKA NATIVE MEDICAL CENTER AK 15.7 4
5 MAT-SU REGIONAL MEDICAL CENTER AK 17.7 5
6 CRESTWOOD MEDICAL CENTER AL 13.3 1
7 BAPTIST MEDICAL CENTER EAST AL 14.2 2
8 SOUTHEAST ALABAMA MEDICAL CENTER AL 14.3 3
9 GEORGIANA HOSPITAL AL 14.5 4
10 PRATTVILLE BAPTIST HOSPITAL AL 14.6 5
# ... with 2,710 more rows
This is a part of a function, so the 'outcome' needs to be a string. Therefore I tried to convert a string to a symbol so that I can reference the column in dplyr. can anyone tell me what's happening here? are there any good ways to achieve my goal?
You need to unquote the symbol with !!
:
arrange(b, State, !!sym(outcome))
Or UQ
:
arrange(b, State, UQ(sym(outcome)))
Similarly for mutate
:
mutate(rank=row_number(!!sym(outcome))) # or mutate(rank=row_number(UQ(sym(outcome))))
If you are only trying to name the column then you will want to use the backtick (`). (It is typically paired with the ~ on the top left of your keyboard just below the ESC key.) Please note that is not the same as the single quotation mark (').
The reason you often will get your variable written like this is from importing header names containing spaces into tibbles. Any header name that has a space in it gets wrapped in `. You need to refer to those columns by also wrapping them in backticks or else R does not recognize you are referring the objects in memory that it can work with. It will just think you are referring to the string and not the object in memory. Though it will happily store the object with a space in its name if you use " or '.
see below demonstration of the issue:
`tidy time` <- 4
'tidy time' <- 5
"tidy time" <- 6
print('tidy time')
print("tidy time")
print(`tidy time`)
This is the cause for R's error message.
Hopefully understanding all that will spare you from having to call on the sym function. In any case, if you remove the space in the name the problem will also go away and you can save the backticks for another day.
To learn more about !! and unquoting variables (which psidom was referring to in his answer), and also learn about the related issues that occur in writing functions that rely on referencing objects with non-standard evaluation in dplyr please see here: https://rpubs.com/hadley/dplyr-programming
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With