I have a corpus: txt = "a patterned layer within a microelectronic pattern." I would like to replace the term "pattern" exactly by "form", I try to write a code:
txt_replaced = gsub("pattern","form",txt)
However, the responsed corpus in txt_replaced is: "a formed layer within a microelectronic form."
As you can see, the term "patterned" is wrongly replaced by "formed" because parts of characteristics in "patterned" matched to "pattern".
I would like to query that if I can replace the string exactly using gsub()? That is, only the term with exactly match should be replaced.
I thirst for a responsed as below: "a patterned layer within a microelectronic form."
Many thanks!
As @koshke noted, a very similar question has been answered before (by me). ...But that was grep and this is gsub, so I'll answer it again:
"\<" is an escape sequence for the beginning of a word, and ">" is the end. In R strings you need to double the backslashes, so:
txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\\<pattern\\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."
Or, you could use \b instead of \< and \>. \b matches a word boundary so it can be used at both ends>
txt_replaced <- gsub("\\bpattern\\b","form",txt)
Also note that if you want to replace only ONE occurrence, you should use sub instead of gsub.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With