Suppose I have some text like this,
text<-c("[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It's harder to save. It's harder to retire. [McCain]: The biggest problem with American healthcare system is that it costs too much. [Obama]: We will have a healthcare system, not a disease-care system. We have the chance to solve problems that we've been talking about... [Text on screen]: Senators McCain and Obama are talking about your healthcare and financial security. We need more than talk. [Obama]: ...year after year after year after year. [Announcer]: Call and make sure their talk turns into real solutions. AARP is responsible for the content of this advertising.")
and I would like to remove (edit: get rid of) all of the text between the [ and ] (and the brackets themselves). What's the best way to do this? Here is my feeble attempt using regex and the stingr package:
str_extract(text, "\\[[a-z]*\\]")
Thanks for any help!
If you want to remove the [] and the () you can use this code: >>> import re >>> x = "This is a sentence.
Method 1: We will use sub() method of re library (regular expressions). sub(): The functionality of sub() method is that it will find the specific pattern and replace it with some string. This method will find the substring which is present in the brackets or parenthesis and replace it with empty brackets.
With this:
gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);
What the regex means:
\[ # '['
[^\]]* # any character except: '\]' (0 or more
# times (matching the most amount possible))
\] # ']'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With