I'm trying to scrape an xml file with BeautifulSoup 4.4.0 that has tag names in camelCase and find_all doesn't seem to be able to find them. Example code:
from bs4 import BeautifulSoup
xml = """
<hello>
world
</hello>
"""
soup = BeautifulSoup(xml, "lxml")
for x in soup.find_all("hello"):
print x
xml2 = """
<helloWorld>
:-)
</helloWorld>
"""
soup = BeautifulSoup(xml2, "lxml")
for x in soup.find_all("helloWorld"):
print x
The output I get is:
$ python soup_test.py
<hello>
world
</hello>
What's the correct way to look up camel cased/uppercased tag names?
For any case-sensitive parsing using BeautifulSoup, you would want to parse in "xml"
mode. The default mode (parsing HTML) doesn't care about case, since HTML doesn't care about case. In your case, instead of using "lxml"
mode, switch it to "xml"
:
from bs4 import BeautifulSoup
xml2 = """
<helloWorld>
:-)
</helloWorld>
"""
soup = BeautifulSoup(xml2, "xml")
for x in soup.find_all("helloWorld"):
print x
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With