Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - replace part of a string using wildcards

Tags:

regex

r

I just started using R again, and I was wondering is there a way to replace part of a string using wildcards.

For example:

say I have

S1 <- "aaaaaaaaa[aaaaa]aaaa[bbbbbbb]aaaa" 

and I want to replace everything within square brackets with 'x', such that the new string is

"aaaaaaaaa[x]aaaa[x]aaaa" 

Is this possible to do in R?

Please note what is in the square bracket can be of variable length.

like image 202
dkr267 Avatar asked Dec 02 '14 09:12

dkr267


People also ask

How do I replace a word in a string in R?

The sub() function in R. The sub() function in R is used to replace the string in a vector or a data frame with the input or the specified string.


2 Answers

A simple regex would be like

\\[.+?\\]

Example http://regex101.com/r/xE1rL1/1

Example Usage

s1 <- 'aaaaaaaaa[aaaaa]aaaa[bbbbbbb]aaaa'
gsub("\\[.+?\\]", "[x]", s1)
## [1] "aaaaaaaaa[x]aaaa[x]aaaa"

Regular expression

  • \\[ matches opening [

  • .+? non greedy matching of anything

  • \\] matches closing ]

EDIT

For safety, if nothing is present in the the [], then the regex can be slightly modified as

s1 <- 'aaaaaaaaa[]aaaa[bbbbbbb]aaaa'
gsub("\\[.*?\\]", "[x]", s1)
##[1] "aaaaaaaaa[x]aaaa[x]aaaa"
like image 131
nu11p01n73R Avatar answered Oct 16 '22 15:10

nu11p01n73R


Could also try qdapRegex package which has a special method for such problems: rm_square

library(qdapRegex)
S1 <- "aaaaaaaaa[aaaaa]aaaa[bbbbbbb]aaaa" 
rm_square(S1, replacement = "[x]")
## [1] "aaaaaaaaa[x]aaaa[x]aaaa"

Will work the same for empty brackets

S1 <- "aaaaaaaaa[]aaaa[bbbbbbb]aaaa" 
rm_square(S1, replacement = "[x]")
## [1] "aaaaaaaaa[x]aaaa[x]aaaa"
like image 44
David Arenburg Avatar answered Oct 16 '22 14:10

David Arenburg