I am currently trying to extract all data from a table. Table data rows are formatted as <td headers="h1" align="left"></td>
when there is no data.
Using the etree.tostring()
method from the lxml
library prints out these elements as <td headers="h1" align="left"/>
instead of the source formatting.
Furthermore, using xpath
if I run the code tree.path('//td[@headers="h1"]/text()')
the resulting list does not include blank values where there is no data.
As I am trying to write these results to a CSV file, how do I include NULL, i.e. ""
when there is no data?
One workaround would be to use //td[@headers="h1"]
xpath to get the elements and then get the .text
property on each:
from lxml import etree
data = """
<table>
<tr>
<td headers="h1" align="left"></td>
<td headers="h1" align="left">Text1</td>
<td headers="h1" align="left"/>
<td headers="h1" align="left">Text2</td>
<td headers="h1" align="left"></td>
</tr>
</table>
"""
tree = etree.fromstring(data)
print [element.text for element in tree.xpath('//td[@headers="h1"]')]
Prints:
[None, 'Text1', None, 'Text2', None]
If you want empty string instead of None
:
print [element.text if element.text is not None else ''
for element in tree.xpath('//td[@headers="h1"]')]
would print:
['', 'Text1', '', 'Text2', '']
Also see: How do I return '' for an empty node's text() in XPath?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With