Suppose I have string like below:
<a>b<c>
I want to remove both <a>
and <c>
, but I can't use gsub("<.*>","","<a>b<c>")
as this will remove the b
also.
I asked a similar question before, but on a second thought, I think I should learn in general, how to deal with this kind of problems. Thanks.
How to remove a character or multiple characters from a string in R? You can either use R base function gsub() or use str_replace() from stringr package to remove characters from a string or text.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
The gsub() function in R is used for replacement operations. The functions takes the input and substitutes it against the specified values. The gsub() function always deals with regular expressions. You can use the regular expressions as the parameter of substitution.
Don't allow a closing bracket >
in the stuff between the brackets:
z <- "<a>b<c>"
gsub("<[^>]+>","",z)
You can use a non-greedy regex, eg. /<.*?>/
.
This will only work for simple HTML and can be easily subverted. Consider the following HTML, which cannot easily be removed using regular expressions.
<span title="Help > Index">
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With