Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsub only part of pattern

Tags:

string

r

gsub

I want to use gsub to correct some names that are in my data. I want names such as "R. J." and "A. J." to have no space between the letters.

For example:

x <- "A. J. Burnett"

I want to use gsub to match the pattern of his first name, and then remove the space:

gsub("[A-Z]\\.\\s[A-Z]\\.", "[A-Z]\\.[A-Z]\\.", x)

But I get:

[1] "[A-Z].[A-Z]. Burnett"

Obviously, instead of the [A-Z]'s I want the actual letters in the original name. How can I do this?

like image 391
Colin Avatar asked May 24 '16 22:05

Colin


2 Answers

Use capture groups by enclosing patterns in (...), and refer to the captured patterns with \\1, \\2, and so on. In this example:

x <- "A. J. Burnett"
gsub("([A-Z])\\.\\s([A-Z])\\.", "\\1.\\2.", x)
[1] "A.J. Burnett"

Also note that in the replacement you don't need to escape the . characters, as they don't have a special meaning there.

like image 57
janos Avatar answered Oct 21 '22 00:10

janos


You can use a look-ahead ((?=\\w\\.)) and a look-behind ((?<=\\b\\w\\.)) to target such spaces and replace them with "".

x <- c("A. J. Burnett", "Dr. R. J. Regex")
gsub("(?<=\\b\\w\\.) (?=\\w\\.)", "", x, perl = TRUE)
# [1] "A.J. Burnett"   "Dr. R.J. Regex"

The look-ahead matches a word character (\\w) followed by a period (\\.), and the look-behind matches a word-boundary (\\b) followed by a word character and a period.

like image 38
Jota Avatar answered Oct 21 '22 01:10

Jota