Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath following-sibling for crawling not returning sibling

I am trying to create a crawler to extract some attribute data from supplier websites that I can audit against our internal attribute database and am new to import.io. I watched a bunch of videos, but though my syntax seems to be right, my manual xpath override isn't returning attribute values. I have the following sample html code:

<table>
<tbody><tr class="oddRow">
<td class="label">&nbsp;Adhesive Type&lrm;</td><td>&nbsp;Epoxy&lrm;
</td>
</tr>
<tr>
<td class="label">&nbsp;Applications&lrm;</td><td>&nbsp;Hard Disk Drive Component Assembly&lrm;
</td>
</tr>
<tr class="oddRow">
<td class="label">&nbsp;Brand&lrm;</td><td>&nbsp;Scotch-Weld&lrm;
</td>
</tr>
<tr>
<td class="label">&nbsp;Capabilities&lrm;</td><td>&nbsp;Sustainability&lrm;
</td>
</tr>
<tr class="oddRow">
<td class="label">&nbsp;Color&lrm;</td><td>&nbsp;Clear Amber&lrm;
</td>

I am trying to write an xpath following sibling statement to grab "Color" through an import.io crawler. The xpath code when I select "Color" is:

//*[@id="attributeList"]/table/tbody/tr[5]/td[1]

I've tried to use:

//*[@id="attributeList"]/table/tbody/tr/td[.="Color"]/following-sibling::td

But it isn't grabbing the color attribute value from the table. I'm not sure if it has something to do with the odd and even row classes? When I look at the html, it seems to make logical sense; color is "Color" and the attribute value is in the following td bracket.

like image 988
Elizabeth VO Avatar asked Jun 05 '15 18:06

Elizabeth VO


1 Answers

The text in the selected td node contains more than just "Color". It is &nbsp;Color&lrm;. So instead you could select td nodes whose text contains the string "Color":

'//*[@id="attributeList"]/table/tbody/tr/td[contains(text(), "Color")]/following-sibling::td/text()'
like image 98
unutbu Avatar answered Sep 30 '22 08:09

unutbu