After I installed BeautifulSoup, Whenever I run my Python in cmd, this warning comes out.
D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. To get rid of this warning, change this: BeautifulSoup([your markup]) to this: BeautifulSoup([your markup], "html.parser")
I have no ideal why it comes out and how to solve it.
It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
The poorly-formed stuff you saw on the Web was referred to as "tag soup", and only a web browser could parse it. Beautiful Soup started out as an HTML parser that would take tag soup and make it beautiful, or at least workable.
The official name of PyPI's Beautiful Soup Python package is beautifulsoup4 . This package ensures that if you type pip install bs4 by mistake you will end up with Beautiful Soup .
The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.
BeautifulSoup( ... )
In order to fix the error, you'll need to specify which parser you'd like to use, like so:
BeautifulSoup( ..., "html.parser" )
You can also install a 3rd party parser if you'd like.
Documentation recommends that you install and use lxml for speed.
BeautifulSoup(html, "lxml")
If you’re using a version of Python 2 earlier than 2.7.3, or a version of Python 3 earlier than 3.2.2, it’s essential that you install lxml or html5lib–Python’s built-in HTML parser is just not very good in older versions.
Installing LXML parser
On Ubuntu (debian)
apt-get install python-lxml
Fedora (RHEL based)
dnf install python-lxml
Using PIP
pip install lxml
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With