Can BeautifulSoup handle broken HTML?

Question

Accepted Answer

It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection.

beautifulsoup .get_text() is not specific enough for my HTML parsing

Tags:

python

html

regex

beautifulsoup

Rorschach

People also ask

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us