Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does "gsub" handle spaces?

Tags:

regex

r

I have a character string "ab b cde", i.e. "ab[space]b[space]cde". I want to replace "space-b" and "space-c" with blank spaces, so that the output string is "ab[space][space][space][space]de". I can't figure out how to get rid of the second "b" without deleting the first one. I have tried:

gsub("[\\sb,\\sc]", " ", "ab b cde", perl=T)

but this is giving me "a[spaces]de". Any pointers? Thanks.

Edit: Consider a more complicated problem: I want to convert the string "akui i ii" i.e. "akui[space]i[space]ii" to "akui[spaces|" by removing the "space-i" and "space-ii".

like image 701
user702432 Avatar asked Feb 14 '12 09:02

user702432


2 Answers

[\sb,\sc] means "one character among space, b, ,, space, c". You probably want something like (\sb|\sc), which means "space followed by b, or space followed by c" or \s[bc] which means "space followed by b or c".

s <- "ab b cde"
gsub( "(\\sb|\\sc)",     "  ", s, perl=TRUE )
gsub( "\\s[bc]",         "  ", s, perl=TRUE )
gsub( "[[:space:]][bc]", "  ", s, perl=TRUE )  # No backslashes

To remove multiple instances of a letter (as in the second example) include a + after the letter to be removed.

s2 <- "akui i ii"
gsub("\\si+", " ", s2)
like image 143
Vincent Zoonekynd Avatar answered Oct 16 '22 11:10

Vincent Zoonekynd


There is a simple solution to this.

    gsub("\\s[bc]", " ", "ab b cde", perl=T)

This will give you what you want.

like image 38
Dr. Mike Avatar answered Oct 16 '22 12:10

Dr. Mike