Parse HTML with Beautiful Soup. Return text from specific tag

Question

I can parse the full argument of a html Tag addressing it over a unix shell script like this:

# !/usr/bin/python3

# import the module
from bs4 import BeautifulSoup

# define your object
soup = BeautifulSoup(open("test.html"))

# get the tag
print(soup(itemprop="name"))

where itemprop="name" uniquely identifies the required tag.

the output is something like

[<span itemprop="name">
                    Blabla &amp; Bloblo</span>]

Now I would like to return only the Bla Bla Blo Blo part.

my attempt was to do:

print(soup(itemprop="name").getText())

but I get an error message like AttributeError: 'ResultSet' object has no attribute 'getText'

it worked experimentally in other contexts such as

print(soup.find('span').getText())

So what am I getting wrong?

Martijn Pieters · Accepted Answer

Using the soup object as a callable returns a list of results, as if you used soup.find_all(). See the documentation:

Because find_all() is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object.

Use soup.find() to find just the first match:

soup.find(itemprop="name").get_text()

or index into the resultset:

soup(itemprop="name")[0].get_text()

Parse HTML with Beautiful Soup. Return text from specific tag

Tags:

python

html

beautifulsoup

joaoal

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Parse HTML with Beautiful Soup. Return text from specific tag

Tags:

python

html

beautifulsoup

joaoal

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us