Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath to locate a cell with specific text parsing HTML tables

Hope someone out there can quickly point me in the right direction with my XPath difficulties.

Current I've got to the point where I'm identifying the correct table i need in my HTML source but then I need to process only the rows that have the text 'Chapter' somewhere in the DOM.

My last attempt was to do this :

// get the correct table
HtmlTable table = page.getFirstByXPath("//table[2]");

// now the failing bit....
def rows = table.getByXPath("*/td[contains(text(),'Chapter')]") 

I thought the xpath above would represent, get me all elements that have a following child element of 'td' that somewhere in its dom contains the text 'Chapter'

An example of a matching row from my source is :

<tr valign="top">
  <td nowrap="" align="Right">
   <font face="Verdana">
   <a href="index.cfm?a=1">Chapter 1</a>
   </font>
  </td>
  <td class="ChapterT">
    <font face="Verdana">DEFINITIONS</font>
  </td>
  <td>&nbsp;</td>
</tr>

Any help / pointers greatly appreciated.

Thanks,

like image 557
David Brown Avatar asked Mar 10 '12 03:03

David Brown


2 Answers

Use this XPath:

//td[contains(., 'Chapter')]
like image 139
Kirill Polishchuk Avatar answered Nov 16 '22 02:11

Kirill Polishchuk


You want all tds under your current node -- not - all in the document as the currently accepted answer selects.

Use:

.//td[.//text()[contains(., 'Chapter')]]

This selects all td descendants of the current node that are named td that have at least one text node descendant, whose string value contains the string "Chapter".

If it is known in advance that any td under this table only has a single text node, this can be simplified to just:

.//td[contains(., 'Chapter')]
like image 42
Dimitre Novatchev Avatar answered Nov 16 '22 00:11

Dimitre Novatchev