Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove single character in string

Tags:

regex

r

Looking for a regex that will remove single characters from a string, with a few conditions. One regex will remove all single characters in a string and the other regex will only remove single characters in between the first and last character. See samples below.

Remove all single characters:

Before

names <- c("John C. Smith", "Chris T. Anderson", "Mary H. Jane",
           "J. J. Smith", "J. Thomas")

After:

"John Smith", "Chris Anderson", "Mary Jane", "Smith", "Thomas"

Removes single characters, excludes the first and last characters

Before

names <- c("John C. Smith", "Chris T. Anderson", "Mary H. Jane",
           "J. J. Smith", "J. Thomas")

After:

"John Smith", "Chris Anderson", "Mary Jane", "J. J. Smith", "J. Thomas"
like image 911
DCRubyHound Avatar asked Jan 02 '17 23:01

DCRubyHound


2 Answers

Edited because I Missed part of the question

gsub can delete a pattern from your data. Here, we remove single characters that have multiple character strings both before and after.

gsub("(\\w\\w)\\W+\\w\\W+(\\w\\w)", "\\1 \\2", names)
[1] "John Smith"     "Chris Anderson" "Mary Jane"   "J. J. Smith" "J. Thomas"

To get rid of all of them.

gsub("\\W*\\b\\w\\b\\W*", " ", names)
[1] "John Smith"     "Chris Anderson" "Mary Jane"      "  Smith"        " Thomas" 
like image 111
G5W Avatar answered Oct 04 '22 20:10

G5W


Here is another option

gsub("\\b[A-Z][[:punct:]]\\s*", "", names)
#[1] "John Smith"     "Chris Anderson" "Mary Jane"      "Smith"         
#[5] "Thomas"        

Or for the second case

sub("(\\w+)\\s+([A-Z][[:punct:]]\\s*){1,}", "\\1 ", names)
#[1] "John Smith"     "Chris Anderson" "Mary Jane"      "J. J. Smith"   
#[5] "J. Thomas"     
like image 32
akrun Avatar answered Oct 04 '22 20:10

akrun