Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove URLs from string

Tags:

string

r

stringr

I have a vector of strings—myStrings—in R that look something like:

[1] download file from `http://example.com`
[2] this is the link to my website `another url`
[3] go to `another url` from more info.

where another url is a valid http url but stackoverflow will not let me insert more than one url thats why i'm writing another url instead. I want to remove all the urls from myStrings to look like:

[1] download file from
[2] this is the link to my website
[3] go to from more info.

I've tried many functions in the stringr package but nothing works.

like image 370
Tavi Avatar asked Aug 17 '14 18:08

Tavi


People also ask

How do I remove a URL from a string?

sub() method to remove URLs from text, e.g. result = re. sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.

How do I remove https from text?

Remove 'http://' or 'https://' from a URL # To remove http:// or https:// from a url, call the replace() method with the following regular expression - /^https?:\/\// and an empty string as parameters. The replace method will return a new string, where the http:// part is removed.

How do I remove a URL from NLP?

"""remove_url takes raw text and removes urls from the text.


1 Answers

 str1 <- c("download file from http://example.com", "this is the link to my website https://www.google.com/ for more info")

 gsub('http\\S+\\s*',"", str1)
 #[1] "download file from "                         
 #[2] "this is the link to my website for more info"

 library(stringr)
 str_trim(gsub('http\\S+\\s*',"", str1)) #removes trailing/leading spaces
 #[1] "download file from"                          
 #[2] "this is the link to my website for more info"

Update

In order to match ftp, I would use the same idea from @Richard Scriven's post

  str1 <- c("download file from http://example.com", "this is the link to my website https://www.google.com/ for more info",
  "this link to ftp://www.example.org/community/mail/view.php?f=db/6463 gives more info")


  gsub('(f|ht)tp\\S+\\s*',"", str1)
  #[1] "download file from "                         
  #[2] "this is the link to my website for more info"
  #[3] "this link to gives more info"     
like image 158
akrun Avatar answered Nov 04 '22 08:11

akrun