I'm working on code to parse a configuration file written in XML, where the XML tags are mixed case and the case is significant. Beautiful Soup appears to convert XML tags to lowercase by default, and I would like to change this behavior.
I'm not the first to ask a question on this subject [see here]. However, I did not understand the answer given to that question and in BeautifulSoup-3.1.0.1 BeautifulSoup.py does not appear to contain any instances of "encodedName
" or "Tag.__str__
"
import html5lib
from html5lib import treebuilders
f = open("mydocument.html")
parser = html5lib.XMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))
document = parser.parse(f)
'document' is now a BeautifulSoup-like tree, but retains the cases of tags. See html5lib for documentation and installation.
According to Leonard Richardson, creator|maintainer of Beautiful Soup, you can't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With