Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I read the contents of an URL with Python?

Tags:

python

The following works when I paste it on the browser:

http://www.somesite.com/details.pl?urn=2344 

But when I try reading the URL with Python nothing happens:

 link = 'http://www.somesite.com/details.pl?urn=2344'  f = urllib.urlopen(link)             myfile = f.readline()    print myfile 

Do I need to encode the URL, or is there something I'm not seeing?

like image 765
Helen Neely Avatar asked Feb 28 '13 14:02

Helen Neely


People also ask

How do I get the url data in python?

Python has a requests module that easily sends HTTP (Hypertext Transfer Protocol) requests. This module can be used to fetch the HTML content or any content from a valid URL. The requests module has a get() method that we can use to fetch data from a URL. This method accepts a url as an argument and returns a requests.

How do I extract text from a url in python?

URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.


2 Answers

To answer your question:

import urllib  link = "http://www.somesite.com/details.pl?urn=2344" f = urllib.urlopen(link) myfile = f.read() print(myfile) 

You need to read(), not readline()

EDIT (2018-06-25): Since Python 3, the legacy urllib.urlopen() was replaced by urllib.request.urlopen() (see notes from https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen for details).

If you're using Python 3, see answers by Martin Thoma or i.n.n.m within this question: https://stackoverflow.com/a/28040508/158111 (Python 2/3 compat) https://stackoverflow.com/a/45886824/158111 (Python 3)

Or, just get this library here: http://docs.python-requests.org/en/latest/ and seriously use it :)

import requests  link = "http://www.somesite.com/details.pl?urn=2344" f = requests.get(link) print(f.text) 
like image 92
woozyking Avatar answered Oct 04 '22 22:10

woozyking


For python3 users, to save time, use the following code,

from urllib.request import urlopen  link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"  f = urlopen(link) myfile = f.read() print(myfile) 

I know there are different threads for error: Name Error: urlopen is not defined, but thought this might save time.

like image 24
i.n.n.m Avatar answered Oct 05 '22 00:10

i.n.n.m