I'm parsing some data from HTML by walking through elements at a certain level using nextSibling, and doing different things depending on the tag name and class of each element encountered.
e.g.,
if n.name == "p" and n.class == "poem": blah()
But this raises an error if the element doesn't have a class or if it isn't an instance of Tag and hence has no name.
Testing before accessing like this
if "name" in n:
always return false. I could check the type of the object returned by nextSibling to try to weed out NavigableString and Comment, but there's got to be an easier way.
EDIT
Emailed the dev of BeautifulSoup with this question and he recommended testing with
n.get("class")
which returns None if "class" is unset, which makes it possible to just do:
if n.get("class") == "poem": blah()
To extract attributes of elements in Beautiful Soup, use the [~] notation. For instance, el["id"] retrieves the value of the id attribute.
To find elements that contain a specific text in Beautiful Soup, we can use find_all(~) method together with a lambda function.
We can access a tag's attributes by treating it like a dictionary. Implementation: Example 1: Program to extract the attributes using attrs approach.
find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.
Besides using get()
method
n.get("class")
Another option is to use has_attr()
(use has_key()
pre BeautifulSoup 4):
n.has_attr("class")
In this case exceptions may be your friend:
try:
if n.name == 'p' and n['class'] == "poem":
blah()
except AttributeError: # element does not have .name attribute
do_something()
except KeyError: # element does not have a class
do_something_else()
You may also wrap it into one except
if this is in case:
try:
if n.name == 'p' and n['class'] == "poem":
blah()
except (AttributeError, KeyError):
pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With