I want to use the htmllib module but it's been removed from Python 3.0. Does anyone know what's the replacement for this module?
Deprecated since version 2.6: The htmllib module has been removed in Python 3. This module defines a class which can serve as a base for parsing text files formatted in the HyperText Mark-up Language (HTML).
html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install: The goal is to support a (non-strict) superset of the versions that pip supports. The following third-party libraries may be used for additional functionality:
Project description html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
When using with urllib2 (Python 2), the charset from HTTP should be pass into html5lib as follows: When using with urllib.request (Python 3), the charset from HTTP should be pass into html5lib as follows: To have more control over the parser, create a parser object explicitly. For instance, to make the parser raise exceptions on parse errors, use:
It is Superseded by HTMLParser see Python library reorganization
I haven't used it, but it looks like what you want is the html.parser
library, and possibly also html.entity
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With