I want to use gsub to correct some names that are in my data. I want names such as "R. J." and "A. J." to have no space between the letters.
For example:
x <- "A. J. Burnett"
I want to use gsub to match the pattern of his first name, and then remove the space:
gsub("[A-Z]\\.\\s[A-Z]\\.", "[A-Z]\\.[A-Z]\\.", x)
But I get:
[1] "[A-Z].[A-Z]. Burnett"
Obviously, instead of the [A-Z]'s I want the actual letters in the original name. How can I do this?
Use capture groups by enclosing patterns in (...)
, and refer to the captured patterns with \\1
, \\2
, and so on. In this example:
x <- "A. J. Burnett"
gsub("([A-Z])\\.\\s([A-Z])\\.", "\\1.\\2.", x)
[1] "A.J. Burnett"
Also note that in the replacement you don't need to escape the .
characters, as they don't have a special meaning there.
You can use a look-ahead ((?=\\w\\.)
) and a look-behind ((?<=\\b\\w\\.)
) to target such spaces and replace them with "".
x <- c("A. J. Burnett", "Dr. R. J. Regex")
gsub("(?<=\\b\\w\\.) (?=\\w\\.)", "", x, perl = TRUE)
# [1] "A.J. Burnett" "Dr. R.J. Regex"
The look-ahead matches a word character (\\w
) followed by a period (\\.
), and the look-behind matches a word-boundary (\\b
) followed by a word character and a period.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With