Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking for attributes in BeautifulSoup?

I'm parsing some data from HTML by walking through elements at a certain level using nextSibling, and doing different things depending on the tag name and class of each element encountered.

e.g.,

if n.name == "p" and n.class == "poem": blah()

But this raises an error if the element doesn't have a class or if it isn't an instance of Tag and hence has no name.

Testing before accessing like this

if "name" in n:

always return false. I could check the type of the object returned by nextSibling to try to weed out NavigableString and Comment, but there's got to be an easier way.

EDIT

Emailed the dev of BeautifulSoup with this question and he recommended testing with

n.get("class")

which returns None if "class" is unset, which makes it possible to just do:

if n.get("class") == "poem": blah()
like image 322
blocks Avatar asked Aug 09 '11 22:08

blocks


People also ask

How do you get attributes in BeautifulSoup Python?

To extract attributes of elements in Beautiful Soup, use the [~] notation. For instance, el["id"] retrieves the value of the id attribute.

How do I find a specific element with BeautifulSoup?

To find elements that contain a specific text in Beautiful Soup, we can use find_all(~) method together with a lambda function.

Which method in BeautifulSoup is used for extracting the attributes from HTML?

We can access a tag's attributes by treating it like a dictionary. Implementation: Example 1: Program to extract the attributes using attrs approach.

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.


2 Answers

Besides using get() method

n.get("class")

Another option is to use has_attr() (use has_key() pre BeautifulSoup 4):

n.has_attr("class")
like image 54
Jasper van den Bosch Avatar answered Oct 06 '22 00:10

Jasper van den Bosch


In this case exceptions may be your friend:

try:
    if n.name == 'p' and n['class'] == "poem":
        blah()
except AttributeError: # element does not have .name attribute
    do_something()
except KeyError: # element does not have a class
    do_something_else()

You may also wrap it into one except if this is in case:

try:
    if n.name == 'p' and n['class'] == "poem":
        blah()
except (AttributeError, KeyError):
    pass
like image 31
Michał Bentkowski Avatar answered Oct 06 '22 00:10

Michał Bentkowski