Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lxml equivalent to BeautifulSoup "OR" syntax?

I'm converting some html parsing code from BeautifulSoup to lxml. I'm trying to figure out the lxml equivalent syntax for the following BeautifullSoup statement:

soup.find('a', {'class': ['current zzt', 'zzt']})

Basically I want to find all of the "a" tags in the document that have a class attribute of either "current zzt" or "zzt". BeautifulSoup allows one to pass in a list, dictionary, or even a regular express to perform the match.

What is the lxml equivalent?

Thanks!

like image 645
erikcw Avatar asked Sep 05 '09 23:09

erikcw


1 Answers

No, lxml does not provide the "find first or return None" method you're looking for. Just use (select(soup) or [None])[0] if you need that, or write a function to do it for you.

#!/usr/bin/python
import lxml.html
import lxml.cssselect
soup = lxml.html.fromstring("""
        <html>
        <a href="foo" class="yyy zzz" />
        <a href="bar" class="yyy" />
        <a href="baz" class="zzz" />
        <a href="quux" class="zzz yyy" />
        <a href="warble" class="qqq" />
        <p class="yyy zzz">Hello</p>
        </html>""")

select = lxml.cssselect.CSSSelector("a.yyy.zzz, a.yyy")
print [lxml.html.tostring(s).strip() for s in select(soup)]
print (select(soup) or [None])[0]

Ok, so soup.find('a') would indeed find first a element or None as you expect. Trouble is, it doesn't appear to support the rich XPath syntax needed for CSSSelector.

like image 164
joeforker Avatar answered Oct 03 '22 05:10

joeforker