Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get rid of BeautifulSoup user warning?

After I installed BeautifulSoup, Whenever I run my Python in cmd, this warning comes out.

D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.  To get rid of this warning, change this:   BeautifulSoup([your markup])  to this:   BeautifulSoup([your markup], "html.parser") 

I have no ideal why it comes out and how to solve it.

like image 717
jellyfishhuang Avatar asked Nov 04 '15 00:11

jellyfishhuang


People also ask

Can BeautifulSoup handle broken HTML?

It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection.

What does bs4 BeautifulSoup () do?

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Why is it called BeautifulSoup?

The poorly-formed stuff you saw on the Web was referred to as "tag soup", and only a web browser could parse it. Beautiful Soup started out as an HTML parser that would take tag soup and make it beautiful, or at least workable.

Is beautifulsoup4 the same as BeautifulSoup?

The official name of PyPI's Beautiful Soup Python package is beautifulsoup4 . This package ensures that if you type pip install bs4 by mistake you will end up with Beautiful Soup .


2 Answers

The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.

BeautifulSoup( ... ) 

In order to fix the error, you'll need to specify which parser you'd like to use, like so:

BeautifulSoup( ..., "html.parser" ) 

You can also install a 3rd party parser if you'd like.

like image 130
Ethan Bierlein Avatar answered Sep 20 '22 00:09

Ethan Bierlein


Documentation recommends that you install and use lxml for speed.

BeautifulSoup(html, "lxml") 

If you’re using a version of Python 2 earlier than 2.7.3, or a version of Python 3 earlier than 3.2.2, it’s essential that you install lxml or html5lib–Python’s built-in HTML parser is just not very good in older versions.

Installing LXML parser

  • On Ubuntu (debian)

    apt-get install python-lxml  
  • Fedora (RHEL based)

    dnf install python-lxml 
  • Using PIP

    pip install lxml 
like image 29
Gayan Weerakutti Avatar answered Sep 22 '22 00:09

Gayan Weerakutti