how to get raw html text of a given url using python

Tags:

python

html

I'm using html2text in python to get raw text (tags included) of a HTML page by taking any URL but I'm getting an error.

My code -

import html2text
import urllib2

proxy = urllib2.ProxyHandler({'http': 'http://<proxy>:<pass>@<ip>:<port>'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read()
print html2text.html2text(html)

The error -

Traceback (most recent call last):
  File "t.py", line 8, in <module>
    html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read()
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

Can anyone explain what I'm doing wrong?

913

asked Feb 19 '15 15:02

aquaman

1 Answers

If you don't require SSL, this script in Python 2.7.x should work:

import urllib
url = "http://stackoverflow.com"
f = urllib.urlopen(url)
print f.read()

and in Python 3.x use urllib.request instead of urllib

Because urllib2 for Python 2, in Python 3 it was merged into urllib.

http:// is required.

EDIT: In 2020, you should use the 3rd party module requests. requests can be installed with pip.

import requests
print(requests.get("http://stackoverflow.com").text)

answered Oct 15 '22 19:10

noɥʇʎԀʎzɐɹƆ

Related questions
                            
                                SQLAlchemy - don't enforce foreign key constraint on a relationship
                            
                                Python: yield-and-delete
                            
                                How do I resolve namespace conflicts in my Python packages with standard library package names?
                            
                                Run Python scripts from Windows command line, argument not passed
                            
                                How to get data from SNMP with python?
                            
                                Peter Piper piped a Python program - and lost all his unicode characters
                            
                                make Mock.assert_called_with() agnostic to args vs kwargs
                            
                                MySQL-python connection does not see changes to database made on another connection even after change is committed
                            
                                Python inheritance - calling base class methods inside child class?
                            
                                SQLAlchemy: Relation table with composite primary key
                            
                                NameError in nested comprehensions [duplicate]
                            
                                How is python-keyring implemented on Windows?
                            
                                In Python cProfile, what is the difference between calls count and primitive calls count?
                            
                                is Pandas concat an in-place function?
                            
                                Does CMake support Python3?
                            
                                Searching PyPI by topic
                            
                                Build wheel for a package (like scipy) lacking dependency declaration
                            
                                How to upload data in bulk to the appengine datastore? Older methods do not work
                            
                                How to load JSON data into nested classes?
                            
                                How to create a Scapy packet from raw bytes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With