I'm converting some html parsing code from BeautifulSoup to lxml. I'm trying to figure out the lxml equivalent syntax for the following BeautifullSoup statement:
soup.find('a', {'class': ['current zzt', 'zzt']})
Basically I want to find all of the "a" tags in the document that have a class attribute of either "current zzt" or "zzt". BeautifulSoup allows one to pass in a list, dictionary, or even a regular express to perform the match.
What is the lxml equivalent?
Thanks!
No, lxml does not provide the "find first or return None" method you're looking for. Just use (select(soup) or [None])[0]
if you need that, or write a function to do it for you.
#!/usr/bin/python
import lxml.html
import lxml.cssselect
soup = lxml.html.fromstring("""
<html>
<a href="foo" class="yyy zzz" />
<a href="bar" class="yyy" />
<a href="baz" class="zzz" />
<a href="quux" class="zzz yyy" />
<a href="warble" class="qqq" />
<p class="yyy zzz">Hello</p>
</html>""")
select = lxml.cssselect.CSSSelector("a.yyy.zzz, a.yyy")
print [lxml.html.tostring(s).strip() for s in select(soup)]
print (select(soup) or [None])[0]
Ok, so soup.find('a')
would indeed find first a element or None as you expect. Trouble is, it doesn't appear to support the rich XPath syntax needed for CSSSelector.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With