Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacement for htmllib module in Python 3.0

I want to use the htmllib module but it's been removed from Python 3.0. Does anyone know what's the replacement for this module?

like image 710
Abbas Avatar asked Apr 28 '10 15:04

Abbas


People also ask

What happened to htmllib in Python?

Deprecated since version 2.6: The htmllib module has been removed in Python 3. This module defines a class which can serve as a base for parsing text files formatted in the HyperText Mark-up Language (HTML).

What versions of PyPy does html5lib support?

html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install: The goal is to support a (non-strict) superset of the versions that pip supports. The following third-party libraries may be used for additional functionality:

What is html5lib?

Project description html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

How to pass charset from http to html5lib in Python?

When using with urllib2 (Python 2), the charset from HTTP should be pass into html5lib as follows: When using with urllib.request (Python 3), the charset from HTTP should be pass into html5lib as follows: To have more control over the parser, create a parser object explicitly. For instance, to make the parser raise exceptions on parse errors, use:


2 Answers

It is Superseded by HTMLParser see Python library reorganization

like image 137
mmmmmm Avatar answered Nov 15 '22 20:11

mmmmmm


I haven't used it, but it looks like what you want is the html.parser library, and possibly also html.entity.

like image 40
Adam Rosenfield Avatar answered Nov 15 '22 18:11

Adam Rosenfield