find_all with camelCase tag names with BeautifulSoup 4

Question

I'm trying to scrape an xml file with BeautifulSoup 4.4.0 that has tag names in camelCase and find_all doesn't seem to be able to find them. Example code:

from bs4 import BeautifulSoup

xml = """
<hello>
    world
</hello>
"""
soup = BeautifulSoup(xml, "lxml")

for x in soup.find_all("hello"):
    print x

xml2 = """
<helloWorld>
    :-)
</helloWorld>
"""
soup = BeautifulSoup(xml2, "lxml")

for x in soup.find_all("helloWorld"):
    print x

The output I get is:

$ python soup_test.py
<hello>
    world
</hello>

What's the correct way to look up camel cased/uppercased tag names?

heinst · Accepted Answer

For any case-sensitive parsing using BeautifulSoup, you would want to parse in "xml" mode. The default mode (parsing HTML) doesn't care about case, since HTML doesn't care about case. In your case, instead of using "lxml" mode, switch it to "xml":

from bs4 import BeautifulSoup

xml2 = """
<helloWorld>
    :-)
</helloWorld>
"""
soup = BeautifulSoup(xml2, "xml")

for x in soup.find_all("helloWorld"):
    print x

find_all with camelCase tag names with BeautifulSoup 4

Tags:

python

beautifulsoup

Paul Johnson

1 Answers

heinst

Recent Activity

Donate For Us

find_all with camelCase tag names with BeautifulSoup 4

Tags:

python

beautifulsoup

Paul Johnson

1 Answers

heinst

Related questions

Recent Activity

Donate For Us