I have the following string, stored in the object sentence
:
sentence <- "aazdlubtirol: RT @tradeDayTrades: sister articles \"$AAPL Dancing in a Burning Room\" January 2013 http://t.co/tkuCRfLy \" $AAPL vs $AAPL \" August 2011 http://t.co/863HkVjn"
I am trying to use gsub to remove urls beginning with http
:
sentence <- gsub('http.*','',sentence)
However, it replaces everything after http
:
aazdlubtirol: RT @tradeDayTrades: sister articles \"$AAPL Dancing in a Burning Room\" January 2013
What I want is:
aazdlubtirol: RT @tradeDayTrades: sister articles \"$AAPL Dancing in a Burning Room\" January 2013 \" $AAPL vs $AAPL \" August 2011
I am trying to clean up the urls so if a string includes http
I want to remove the url. I found some solutions but they are not helping me.
Add a space to your replacement group:
gsub('http.* *', '', sentence)
Or using \\s
which is regex for space:
gsub('http.*\\s*', '', sentence)
As per the comment, .*
will match anything and regular expressions are greedy. Instead we should match one or more non-whitespace character any number of times followed by zero or more spaces:
gsub('http\\S+\\s*', '', sentence)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With