Looking for a regex that will remove single characters from a string, with a few conditions. One regex will remove all single characters in a string and the other regex will only remove single characters in between the first and last character. See samples below.
Before
names <- c("John C. Smith", "Chris T. Anderson", "Mary H. Jane",
"J. J. Smith", "J. Thomas")
After:
"John Smith", "Chris Anderson", "Mary Jane", "Smith", "Thomas"
Before
names <- c("John C. Smith", "Chris T. Anderson", "Mary H. Jane",
"J. J. Smith", "J. Thomas")
After:
"John Smith", "Chris Anderson", "Mary Jane", "J. J. Smith", "J. Thomas"
Edited because I Missed part of the question
gsub can delete a pattern from your data. Here, we remove single characters that have multiple character strings both before and after.
gsub("(\\w\\w)\\W+\\w\\W+(\\w\\w)", "\\1 \\2", names)
[1] "John Smith" "Chris Anderson" "Mary Jane" "J. J. Smith" "J. Thomas"
To get rid of all of them.
gsub("\\W*\\b\\w\\b\\W*", " ", names)
[1] "John Smith" "Chris Anderson" "Mary Jane" " Smith" " Thomas"
Here is another option
gsub("\\b[A-Z][[:punct:]]\\s*", "", names)
#[1] "John Smith" "Chris Anderson" "Mary Jane" "Smith"
#[5] "Thomas"
Or for the second case
sub("(\\w+)\\s+([A-Z][[:punct:]]\\s*){1,}", "\\1 ", names)
#[1] "John Smith" "Chris Anderson" "Mary Jane" "J. J. Smith"
#[5] "J. Thomas"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With