I'm trying to convert a character string to numeric and have encountered some unexpected behaviour with str_replace
. Here's a minimum working example:
library(stringr)
x <- c("0", "NULL", "0")
# This works, i.e. 0 NA 0
as.numeric(str_replace(x, "NULL", ""))
# This doesn't, i.e. NA NA NA
as.numeric(str_replace(x, "NULL", NA))
To my mind, the second example should work as it should only replace the second entry in the vector with NA
(which is a valid value in a character vector). But it doesn't: the inner str_replace
converts all three entries to NA
.
What's going on here? I had a look through the documentation for str_replace
and stri_replace_all
but don't see an obvious explanation.
EDIT: To clarify, this is with stringr_1.0.0
and stringi_1.0-1
on R 3.1.3, Windows 7.
This was a bug in the stringi
package but now it is fixed (recall that stringr
is based on stringi
- the former shall be affected too).
With the most recent development version we get:
stri_replace_all_fixed(c("1", "NULL"), "NULL", NA)
## [1] "1" NA
Look at the source code of str_replace
.
function (string, pattern, replacement)
{
replacement <- fix_replacement(replacement)
switch(type(pattern), empty = , bound = stop("Not implemented",
call. = FALSE), fixed = stri_replace_first_fixed(string,
pattern, replacement, opts_fixed = attr(pattern, "options")),
coll = stri_replace_first_coll(string, pattern, replacement,
opts_collator = attr(pattern, "options")), regex = stri_replace_first_regex(string,
pattern, replacement, opts_regex = attr(pattern,
"options")), )
}
<environment: namespace:stringr>
This leads to finding fix_replacement
, which is at Github, and I've put it below too. If you run it in your main environment, you find out that fix_replacement(NA)
returns NA
. You can see that it relies on stri_replace_all_regex
, which is from the stringi
package.
fix_replacement <- function(x) {
stri_replace_all_regex(
stri_replace_all_fixed(x, "$", "\\$"),
"(?<!\\\\)\\\\(\\d)",
"\\$$1")
}
The interesting thing is that stri_replace_first_fixed
and stri_replace_first_regex
both return c(NA,NA,NA)
when run with your parameters (your string
, pattern
, and replacement
). The problem is that stri_replace_first_fixed
and stri_replace_first_regex
are C++ code, so it gets a little trickier to figure out what's happening.
stri_replace_first_fixed
can be found here.
stri_replace_first_regex
can be found here.
As far as I can discern with limited time and my relatively rusty C++ knowledge, the function stri__replace_allfirstlast_fixed
checks the replacement
argument using stri_prepare_arg_string
. According to the documentation for that, it will throw an error if it encounters an NA. I don't have time to fully trace it beyond this, but I would suspect that this error may be causing the odd return of all NAs.
Here's a solution using dplyr's across
method and the stringr package.
df <- data.frame(x=c("a","b","null","e"),
y=c("g","null","h","k"))
df2 <- df %>%
mutate(across(everything(),str_replace,"null",NA_character_))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With