Linux <pre class="prettyprint"><code>>>> from lxml import etree >>> html='''<td><a href=''>a1</a></td> ... <td><a href=''>a2</a></td> ... ''' >>> p=etree.HTML(html) >>> a=p.xpath("//a[1]") >>> for i in a: ... print i.text ... a1 a2 </code></pre> windows. <pre class="prettyprint"><code>>>> html='''<td><a href=''>a1</a></td> ... <td><a href=''>a2</a></td> ... ''' >>> from lxml import etree >>> p=etree.HTML(html) >>> a=p.xpath("//a[1]") >>> for i in a: ... print i.text ... a1 >>> b=p.xpath("//a[2]") >>> for i in b: ... print i.text ... a2 </code></pre> In Windows, I can easily to use <code>a[1]</code> and <code>a[2]</code> to get those two value. But in Linux, xpath <code>//a[1]</code> get those two link text together. This make the program not so compatible in those OS. I have to modify code on different OS. Is it a lxml module bug ? Any solution for this ?

I can confirm the same result on Linux as you report. It returns a list of two elements instead of 1 single element. <h3>What is xpath <code>//a[1]</code> asking for</h3> It is asking for any <code>a</code> element which is first within it's context. As you have <code>a</code> element embedded inside of <code>td</code>, <code>td</code> is the context for calculating the position and there are two occurrences of such situation. Changing xpath to <code>"(//a)[1]"</code> resolves the problem. Quoting from MSDN on Operators and Special Characters <blockquote> The filter pattern operators ([]) have a higher precedence than the path operators (/ and //). For example, the expression //comment()[3] selects all comments with an index equal to 3 relative to the comment's parent anywhere in the document. This differs from the expression (//comment())[3], which selects the third comment from the set of all comments relative to the parent. The first expression can return more than one comment, while the latter can return only one comment. </blockquote> <h3>Downgrade broken Windows lxml version 3.3.5</h3> xpath <code>//a[1]</code> returning only one element of provided document is simply wrong and shall be reported to lxml authors. Status of lxml on diferent platfoms and OS: <ul> <li>Win: lxml 2.3.0 - OK</li> <li>Win: lxml 3.3.5 - BUG</li> <li>Lin: lxml 3.3.5 - OK</li> <li>Lin: lxml 2.3.0 - OK</li> </ul> To make your solution portable, you shall require <code>lxml==2.3.0</code> as this version behaves on Windows as well as on Linux correctly (there might be another version working well on both platforms, I did not test more). <h3>Bonus - test suite</h3> Assuming you have installed <code>nose</code> <pre class="prettyprint"><code>$ pip install nose </code></pre> You can use following <code>test_xpath.py</code>: <pre class="prettyprint"><code>from lxml import etree import nose print "==================================" print "lxml version: ", etree.__version__ print "==================================" def test_html(): html_str = """ <td><a href=''>a1</a></td> <td><a href=''>a2</a></td> """ doc = etree.HTML(html_str.strip()) elms = doc.xpath("//a[1]") assert len(elms) == 2, """xpath `//a[1]` shall return 2 elements""" assert all(elm.tag == "a" for elm in elms), "all returned elements shall be `a`" assert elms[0].text == "a1" assert elms[1].text == "a2" def test_xml(): xml_str = """ <root> <td><a href=''>a1</a></td> <td><a href=''>a2</a></td> </root> """ doc = etree.fromstring(xml_str.strip()) elms = doc.xpath("//a[1]") assert len(elms) == 2, """xpath `//a[1]` shall return 2 elements""" assert all(elm.tag == "a" for elm in elms), "all returned elements shall be `a`" assert elms[0].text == "a1" assert elms[1].text == "a2" nose.main() </code></pre> and perform a test quickly: <pre class="prettyprint"><code>$ python test_xpath.py -v ================================== lxml version: 2.3.0 ================================== test_xpath.test_html ... ok test_xpath.test_xml ... ok ---------------------------------------------------------------------- Ran 2 tests in 0.002s OK </code></pre>

python lxml different result on windows and linux

Tags:

python

xpath

lxml

elementtree

Linux

>>> from lxml import etree
>>> html='''<td><a href=''>a1</a></td>
... <td><a href=''>a2</a></td>
... '''
>>> p=etree.HTML(html)
>>> a=p.xpath("//a[1]")
>>> for i in a:
...    print i.text
... 
a1
a2

windows.

>>> html='''<td><a href=''>a1</a></td>
... <td><a href=''>a2</a></td>
... '''
>>> from lxml import etree
>>> p=etree.HTML(html)
>>> a=p.xpath("//a[1]")
>>> for i in a:
...    print i.text
...
a1
>>> b=p.xpath("//a[2]")
>>> for i in b:
...    print i.text
...
a2

In Windows, I can easily to use a[1] and a[2] to get those two value. But in Linux, xpath //a[1] get those two link text together.

This make the program not so compatible in those OS. I have to modify code on different OS. Is it a lxml module bug ? Any solution for this ?

358

asked Jun 06 '14 05:06

Niuya

1 Answers

I can confirm the same result on Linux as you report. It returns a list of two elements instead of 1 single element.

What is xpath `//a[1]` asking for

It is asking for any a element which is first within it's context.

As you have a element embedded inside of td, td is the context for calculating the position and there are two occurrences of such situation.

Changing xpath to "(//a)[1]" resolves the problem.

Quoting from MSDN on Operators and Special Characters

The filter pattern operators ([]) have a higher precedence than the path operators (/ and //). For example, the expression //comment()[3] selects all comments with an index equal to 3 relative to the comment's parent anywhere in the document. This differs from the expression (//comment())[3], which selects the third comment from the set of all comments relative to the parent. The first expression can return more than one comment, while the latter can return only one comment.

Downgrade broken Windows lxml version 3.3.5

xpath //a[1] returning only one element of provided document is simply wrong and shall be reported to lxml authors.

Status of lxml on diferent platfoms and OS:

Win: lxml 2.3.0 - OK
Win: lxml 3.3.5 - BUG
Lin: lxml 3.3.5 - OK
Lin: lxml 2.3.0 - OK

To make your solution portable, you shall require lxml==2.3.0 as this version behaves on Windows as well as on Linux correctly (there might be another version working well on both platforms, I did not test more).

Bonus - test suite

Assuming you have installed nose

$ pip install nose

You can use following test_xpath.py:

from lxml import etree
import nose

print "=================================="
print "lxml version: ", etree.__version__
print "=================================="

def test_html():
    html_str = """
    <td><a href=''>a1</a></td>
    <td><a href=''>a2</a></td>
    """
    doc = etree.HTML(html_str.strip())
    elms = doc.xpath("//a[1]")
    assert len(elms) == 2, """xpath `//a[1]` shall return 2 elements"""
    assert all(elm.tag == "a" for elm in elms), "all returned elements shall be `a`"
    assert elms[0].text == "a1"
    assert elms[1].text == "a2"

def test_xml():
    xml_str = """
    <root>
        <td><a href=''>a1</a></td>
        <td><a href=''>a2</a></td>
    </root>
    """
    doc = etree.fromstring(xml_str.strip())
    elms = doc.xpath("//a[1]")
    assert len(elms) == 2, """xpath `//a[1]` shall return 2 elements"""
    assert all(elm.tag == "a" for elm in elms), "all returned elements shall be `a`"
    assert elms[0].text == "a1"
    assert elms[1].text == "a2"

nose.main()

and perform a test quickly:

$ python test_xpath.py  -v
==================================
lxml version:  2.3.0
==================================
test_xpath.test_html ... ok
test_xpath.test_xml ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK

answered Sep 18 '22 02:09

Jan Vlcinsky

Related questions
                            
                                How to order data in sqlalchemy by list
                            
                                Python: Memory efficient sort of a list of tuples by two elements
                            
                                EOFError with multiprocessing Manager
                            
                                Why django uses tuple of tuples to store static dictionaries and should i do the same?
                            
                                How can I specifiy the .spec file in PyInstaller
                            
                                Intercepting heapq
                            
                                Python AES implementations difference
                            
                                python + wsgi on a multi-threaded web-server: is this a race condition?
                            
                                djangojs makemessage fails - djangojs.pot: No such file or directory
                            
                                Pandas Count Unique occurrences by Month
                            
                                How can I do an interpolating reindex in pandas using datetime indices?
                            
                                Vectorised average K-Nearest Neighbour distance in Python
                            
                                Virtualenv and Anaconda issues
                            
                                How do I use cvxopt for mean variance optimization with constraints?
                            
                                Pandas temporal cumulative sum by group
                            
                                Creating a Python script that runs as a Windows service using sc.exe
                            
                                Using Python Pillow lib to set Color depth
                            
                                Sending a flask request within a flask request
                            
                                How to draw a matrix sparsity pattern with color code in python?
                            
                                Getting User object from username in django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python lxml different result on windows and linux

Tags:

python

xpath

lxml

elementtree

Niuya

People also ask

1 Answers

What is xpath `//a[1]` asking for

Downgrade broken Windows lxml version 3.3.5

Bonus - test suite

Jan Vlcinsky

Recent Activity

Donate For Us

python lxml different result on windows and linux

Tags:

python

xpath

lxml

elementtree

Niuya

People also ask

1 Answers

What is xpath //a[1] asking for

Downgrade broken Windows lxml version 3.3.5

Bonus - test suite

Jan Vlcinsky

Related questions

Recent Activity

Donate For Us

What is xpath `//a[1]` asking for