Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a URL to a text file, what is the simplest way to read the contents of the text file?

Tags:

python

Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2

Actually the simplest way is:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

You don't even need "readlines", as Will suggested. You could even shorten it to: *

import urllib2

for line in urllib2.urlopen(target_url):
    print line

But remember in Python, readability matters.

However, this is the simplest way but not the safe way because most of the time with network programming, you don't know if the amount of data to expect will be respected. So you'd generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

* Second example in Python 3:

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

I'm a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

or alternatively

from urllib.request import urlopen
data = urlopen(target_url)

Note that just import urllib does not work.


The requests library has a simpler interface and works with both Python 2 and 3.

import requests

response = requests.get(target_url)
data = response.text

There's really no need to read line-by-line. You can get the whole thing like this:

import urllib
txt = urllib.urlopen(target_url).read()