I can parse the full argument of a html Tag addressing it over a unix shell script like this:
# !/usr/bin/python3
# import the module
from bs4 import BeautifulSoup
# define your object
soup = BeautifulSoup(open("test.html"))
# get the tag
print(soup(itemprop="name"))
where itemprop="name"
uniquely identifies the required tag.
the output is something like
[<span itemprop="name">
Blabla & Bloblo</span>]
Now I would like to return only the Bla Bla Blo Blo
part.
my attempt was to do:
print(soup(itemprop="name").getText())
but I get an error message like AttributeError: 'ResultSet' object has no attribute 'getText'
it worked experimentally in other contexts such as
print(soup.find('span').getText())
So what am I getting wrong?
Using the soup
object as a callable returns a list of results, as if you used soup.find_all()
. See the documentation:
Because
find_all()
is the most popular method in the Beautiful Soup search API, you can use a shortcut for it. If you treat theBeautifulSoup
object or aTag
object as though it were a function, then it’s the same as callingfind_all()
on that object.
Use soup.find()
to find just the first match:
soup.find(itemprop="name").get_text()
or index into the resultset:
soup(itemprop="name")[0].get_text()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With