Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I change BeautifulSoup's behavior regarding converting XML tags to lowercase?

I'm working on code to parse a configuration file written in XML, where the XML tags are mixed case and the case is significant. Beautiful Soup appears to convert XML tags to lowercase by default, and I would like to change this behavior.

I'm not the first to ask a question on this subject [see here]. However, I did not understand the answer given to that question and in BeautifulSoup-3.1.0.1 BeautifulSoup.py does not appear to contain any instances of "encodedName" or "Tag.__str__"

like image 375
Rob Carr Avatar asked May 21 '09 07:05

Rob Carr


2 Answers

import html5lib
from html5lib import treebuilders

f = open("mydocument.html")
parser = html5lib.XMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))
document = parser.parse(f)

'document' is now a BeautifulSoup-like tree, but retains the cases of tags. See html5lib for documentation and installation.

like image 92
TML Avatar answered Sep 30 '22 17:09

TML


According to Leonard Richardson, creator|maintainer of Beautiful Soup, you can't.

like image 43
Rob Carr Avatar answered Sep 30 '22 17:09

Rob Carr