Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More than 9 backreferences in gsub()

Tags:

regex

r

gsub

How to use gsub with more than 9 backreferences? I would expect the output in the example below to be "e, g, i, j, o".

> test <- "abcdefghijklmnop"
> gsub("(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)", "\\5, \\7, \\9, \\10, \\15", test, perl = TRUE)
[1] "e, g, i, a0, a5"
like image 324
learnr Avatar asked Sep 09 '09 17:09

learnr


2 Answers

See Regular Expressions with The R Language:

You can use the backreferences \1 through \9 in the replacement text to reinsert text matched by a capturing group. There is no replacement text token for the overall match. Place the entire regex in a capturing group and then use \1.

But with PCRE you should be able to use named groups. So try (?P<name>regex) for groupd naming and (?P=name) as backreference.

like image 163
Gumbo Avatar answered Nov 15 '22 21:11

Gumbo


The stri_replace_*_regex functions from the stringi package do not have such limitations:

library("stringi")
stri_replace_all_regex("abcdefghijkl", "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)", "$10$1$11$12")
## [1] "jakl"

If you want to follow the 1st capture group with 1, use e.g.

stri_replace_all_regex("abcdefghijkl", "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)", "$10$1$1\\1$12")
## [1] "jaa1l"
like image 29
gagolews Avatar answered Nov 15 '22 22:11

gagolews