How to find the li
tags with a specific class name but not others? For example:
...
<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
...
<li class="z"> I WANT THIS ONLY ONE</li>
...
the code:
bs4.find_all ('li', class_='z')
returns several entries where there is a "z"
and another class name.
How to find the entry with the class name "z"
, alone ?
You can use CSS selectors to match the exact class name.
html = '''<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
<li class="z"> I WANT THIS ONLY ONE</li>'''
soup = BeautifulSoup(html, 'lxml')
tags = soup.select('li[class="z"]')
print(tags)
The same result can be achieved using lambda
:
tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])
Output:
[<li class="z"> I WANT THIS ONLY ONE</li>]
Have a look at Multi-valued attributes. You'll understand why class_='z'
matches all the tags that have z
in their class name.
HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is
class
(that is, a tag can have more than one CSS class). Others includerel
,rev
,accept-charset
,headers
, andaccesskey
. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:css_soup = BeautifulSoup('<p class="body"></p>') css_soup.p['class'] # ["body"] css_soup = BeautifulSoup('<p class="body strikeout"></p>') css_soup.p['class'] # ["body", "strikeout"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With