I have a vector of strings—myStrings
—in R that look something like:
[1] download file from `http://example.com`
[2] this is the link to my website `another url`
[3] go to `another url` from more info.
where another url
is a valid http url but stackoverflow will not let me insert more than one url thats why i'm writing another url
instead. I want to remove all the urls from myStrings
to look like:
[1] download file from
[2] this is the link to my website
[3] go to from more info.
I've tried many functions in the stringr
package but nothing works.
sub() method to remove URLs from text, e.g. result = re. sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.
Remove 'http://' or 'https://' from a URL # To remove http:// or https:// from a url, call the replace() method with the following regular expression - /^https?:\/\// and an empty string as parameters. The replace method will return a new string, where the http:// part is removed.
"""remove_url takes raw text and removes urls from the text.
str1 <- c("download file from http://example.com", "this is the link to my website https://www.google.com/ for more info")
gsub('http\\S+\\s*',"", str1)
#[1] "download file from "
#[2] "this is the link to my website for more info"
library(stringr)
str_trim(gsub('http\\S+\\s*',"", str1)) #removes trailing/leading spaces
#[1] "download file from"
#[2] "this is the link to my website for more info"
In order to match ftp
, I would use the same idea from @Richard Scriven's post
str1 <- c("download file from http://example.com", "this is the link to my website https://www.google.com/ for more info",
"this link to ftp://www.example.org/community/mail/view.php?f=db/6463 gives more info")
gsub('(f|ht)tp\\S+\\s*',"", str1)
#[1] "download file from "
#[2] "this is the link to my website for more info"
#[3] "this link to gives more info"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With