Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you extract a url from a string using python?

Tags:

For example:

string = "This is a link http://www.google.com" 

How could I extract 'http://www.google.com' ?

(Each link will be of the same format i.e 'http://')

like image 948
Sheldon Avatar asked Mar 18 '12 17:03

Sheldon


People also ask

How do I extract a URL from text in Python?

URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.

How do I extract a link in Python?

Open the file in Binary mode and it recognizes the pattern of URL in the file. Define a function to extract the link for a particular page. Iterate over all the pages and extract the text using extractText() function. To extract the hyperlinks from the PDF we generally use Pattern Matching Concept in Python.


1 Answers

There may be few ways to do this but the cleanest would be to use regex

>>> myString = "This is a link http://www.google.com" >>> print re.search("(?P<url>https?://[^\s]+)", myString).group("url") http://www.google.com 

If there can be multiple links you can use something similar to below

>>> myString = "These are the links http://www.google.com  and http://stackoverflow.com/questions/839994/extracting-a-url-in-python" >>> print re.findall(r'(https?://[^\s]+)', myString) ['http://www.google.com', 'http://stackoverflow.com/questions/839994/extracting-a-url-in-python'] >>>  
like image 139
Abhijit Avatar answered Sep 20 '22 20:09

Abhijit