lxml not properly parsing tags with multiple classes

Tags:

lxml

I am trying to parse HTML using

a = lxml.html.fromstring('<html><body><span class="cut cross">Text of double class</span><span class="cross">Text of single class</span></body></html>')
s1 = a.xpath('.//span[@class="cross"]')
s2 = a.xpath('.//span[@class="cut cross"]')
s3 = a.xpath('.//span[@class="cut"]')

Output:

s1 => [<Element span at 0x7f0a6807a530>]
s2 => [<Element span at 0x7f0a6807a590>]
s3 => []

But the first span tag has class 'cut', yet s3 is empty. While in s2, when I give both classes, it returns the tag.

883

asked Jan 21 '13 15:01

2 Answers

XPaths equal operator matches exactly the right and left operands. If you want to search for one of the class, you can use the contains function :

a.xpath('.//span[contains(@class, "cut")]')

However, it can also matches a class like cut2.

cssselect is a library that handles CSS selectors. A wrapper named pyquery mimics the JQuery library in python.

161

answered Oct 14 '22 13:10

I'm pretty sure the CSS data model (i.e. classes are space-separated values in a single class attribute) isn't adhered to for XPath queries. In order to do what you want, you should look at using CSS selectors (for example, via cssselect).

answered Oct 14 '22 13:10

djc

Related questions
                            
                                Parsing unclosed `<br>` tags with BeautifulSoup
                            
                                This character - ㎜ - raises a UnicodeEncodeError
                            
                                Finding Sum of a Column in a List Getting "TypeError: cannot perform reduce with flexible type"
                            
                                How to implement optional first argument (to reproduce slice() behavior) [duplicate]
                            
                                Elegant way to safely .text.strip() in BeautifulSoup?
                            
                                Recursion on Fibonacci Sequence
                            
                                How to pass multiple variable from php to python script
                            
                                Get element inside current element using xpath
                            
                                Model by name in SQLAlchemy
                            
                                Setting UAC to requireAdministrator using PyInstaller onefile option and manifest
                            
                                For-loops in Python 3.0
                            
                                Why are defaults not appearing in my command-line argument dictionary from docopt?
                            
                                Python Module for Session Management
                            
                                Comparing lists by reference vs value in Python
                            
                                Fixing faulty unicode strings
                            
                                Set a cookie and retrieve it with Python and WSGI
                            
                                python tkinter with threading causing crash
                            
                                python tuples and lists. A tuple that refuses to convert
                            
                                KeyEvent in MainWindow (PyQt4)
                            
                                counting duplicate words in python the fastest way

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

lxml not properly parsing tags with multiple classes

Tags:

python

lxml

WeaklyTyped

People also ask

2 Answers

Scharron

djc

Recent Activity

Donate For Us