I'm a fan of the revalue function is plyr for substituting strings. It's simple and easy to remember.
However, I've migrated new code to dplyr which doesn't appear to have a revalue function. What is the accepted idiom in dplyr for doing things previously done with revalue?
There is a recode function available starting with dplyr version dplyr_0.5.0 which looks very similar to revalue from plyr.
Example built from the recode documentation Examples section:
set.seed(16) x = sample(c("a", "b", "c"), 10, replace = TRUE) x [1] "a" "b" "a" "b" "b" "a" "c" "c" "c" "a" recode(x, a = "Apple", b = "Bear", c = "Car") [1] "Car" "Apple" "Bear" "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple" If you only define some of the values that you want to recode, by default the rest are filled with NA.
recode(x, a = "Apple", c = "Car") [1] "Car" "Apple" NA "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple" This behavior can be changed using the .default argument.
recode(x, a = "Apple", c = "Car", .default = x) [1] "Car" "Apple" "b" "Apple" "Car" "Apple" "Apple" "Car" "Car" "Apple" There is also a .missing argument if you want to replace missing values with something else.
We can do this with chartr from base R
chartr("ac", "AC", x)
x <- c("a", "b", "c")
I wanted to comment on the answer by @aosmith, but lack reputation. It seems that nowadays the default of dplyr's recode function is to leave unspecified levels unaffected.
x = sample(c("a", "b", "c"), 10, replace = TRUE)
x
[1] "c" "c" "b" "b" "a" "b" "c" "c" "c" "b"
recode(x , a = "apple", b = "banana" )
[1] "c" "c" "banana" "banana" "apple" "banana" "c" "c" "c" "banana"
To change all nonspecified levels to NA, the argument .default = NA_character_ should be included.
recode(x, a = "apple", b = "banana", .default = NA_character_)
[1] "apple" "banana" "apple" "banana" "banana" "apple" NA NA NA "apple"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With