Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple example regarding how to parse data from html output using lxml

Im converting some python scripts that uses regex to exract contents from a html output to libxml2, but since im starting at this, a little help would be apreciated.

how i can extract the values from "working directory" , "Packages/Updates" , and "Java Data Model" of the example bellow using lxml?

<tr>
  <script>writeTD("row");</script>
  <td class="oddrow"><nobr>Working Dir</nobr></td>
  <script>writeTD("rowdata-l");</script>
  <td class="oddrowdata-l">/serves/test_servers</td>
</tr> 
<script>swapRows();</script>
<tr>
  <script>writeTD("row");</script>
  <td class="evenrow"><nobr>Packages/Updates</nobr></td>
  <script>writeTD("rowdata-l");</script>
  <td class="evenrowdata-l"><a href="updates.dsp">View</a></td>
</tr> 
<script>swapRows();</script>
<tr>
  <script>writeTD("row");</script>
  <td class="oddrow"><nobr>Java Data Model</nobr></td>
  <script>writeTD("rowdata-l");</script>
  <td class="oddrowdata-l">64-bit</td>
</tr>
</tbody></table>
</td>
</tr>
</tbody></table>

Thanks in advance.

like image 908
thclpr Avatar asked Mar 03 '26 11:03

thclpr


1 Answers

Using the HTML you posted as content,

import lxml.html as LH
doc = LH.fromstring(content)
tds = (td.text_content() for td in doc.xpath('//td'))    
for td, val in zip(*[tds]*2):
    if td in ("Working Dir", "Java Data Model"):
        print(td,val)

yields

('Working Dir', '/serves/test_servers')
('Java Data Model', '64-bit')

This line does most of the work:

tds = (td.text_content() for td in doc.xpath('//td'))

It uses the xpath() method to search for all <td> tags. It uses the text_content() method to extract the associated text.

zip(*[tds]*2) is the grouper idiom to iterate over tds in pairs:

for td, val in zip(*[tds]*2):
    print(td,val)

Note that this assumes that <td> labels and values follow each other alternately.

like image 95
unutbu Avatar answered Mar 05 '26 00:03

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!