Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible for xpath to return NULL if there is no text data?

Tags:

xpath

I am currently trying to extract all data from a table. Table data rows are formatted as <td headers="h1" align="left"></td> when there is no data.

Using the etree.tostring() method from the lxml library prints out these elements as <td headers="h1" align="left"/> instead of the source formatting.

Furthermore, using xpath if I run the code tree.path('//td[@headers="h1"]/text()') the resulting list does not include blank values where there is no data.

As I am trying to write these results to a CSV file, how do I include NULL, i.e. "" when there is no data?

like image 970
toolshed Avatar asked Oct 31 '22 22:10

toolshed


1 Answers

One workaround would be to use //td[@headers="h1"] xpath to get the elements and then get the .text property on each:

from lxml import etree

data = """
<table>
    <tr>
        <td headers="h1" align="left"></td>
        <td headers="h1" align="left">Text1</td>
        <td headers="h1" align="left"/>
        <td headers="h1" align="left">Text2</td>
        <td headers="h1" align="left"></td>
    </tr>
</table>
"""

tree = etree.fromstring(data)
print [element.text for element in tree.xpath('//td[@headers="h1"]')]

Prints:

[None, 'Text1', None, 'Text2', None]

If you want empty string instead of None:

print [element.text if element.text is not None else ''
       for element in tree.xpath('//td[@headers="h1"]')]

would print:

['', 'Text1', '', 'Text2', '']

Also see: How do I return '' for an empty node's text() in XPath?

like image 64
alecxe Avatar answered Jan 01 '23 12:01

alecxe