Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using captured groups in str_replace / stri_replace - stringi vs stringr [duplicate]

Tags:

r

stringi

Most stringr functions are just wrappers around corresponding stringi functions. str_replace_all is one of those. Yet my code does not work with stri_replace_all, the corresponding stringi function.

I am writing a quick regex to convert (a subset of) camel case to spaced words.

I am quite puzzled as to why this works:

str <- "thisIsCamelCase aintIt"
stringr::str_replace_all(str, 
                         pattern="(?<=[a-z])([A-Z])", 
                         replacement=" \\1")
# "this Is Camel Case ain't It"

And this does not:

stri_replace_all(str, 
                 regex="(?<=[a-z])([A-Z])", 
                 replacement=" \\1")
# "this 1s 1amel 1ase ain't 1t"
like image 864
asachet Avatar asked Aug 19 '16 10:08

asachet


2 Answers

If you look at the source for stringr::str_replace_all you'll see that it calls fix_replacement(replacement) to convert the \\# capture group references to $#. But the help on stringi:: stri_replace_all also clearly shows that you use $1, $2, etc for the capture groups.

str <- "thisIsCamelCase aintIt"
stri_replace_all(str, regex="(?<=[a-z])([A-Z])", replacement=" $1")
## [1] "this Is Camel Case aint It"
like image 93
hrbrmstr Avatar answered Sep 21 '22 01:09

hrbrmstr


The below option should return the same output in both cases.

pat <- "(?<=[a-z])(?=[A-Z])"
str_replace_all(str, pat, " ")
#[1] "this Is Camel Case aint It"
stri_replace_all(str, regex=pat, " ")
#[1] "this Is Camel Case aint It"

According to the help page of ?stri_replace_all, there are examples that suggest $1, $2 are used for replacement

stri_replace_all_regex('123|456|789', '(\\p{N}).(\\p{N})', '$2-$1')

So, it should work if we replace the \\1 with $1

stri_replace_all(str, regex = "(?<=[a-z])([A-Z])", " $1")
#[1] "this Is Camel Case aint It"
like image 25
akrun Avatar answered Sep 21 '22 01:09

akrun